首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
The FSSP database of structurally aligned protein fold families.   总被引:17,自引:0,他引:17       下载免费PDF全文
L Holm  C Sander 《Nucleic acids research》1994,22(17):3600-3609
FSSP (families of structurally similar proteins) is a database of structural alignments of proteins in the Protein Data Bank (PDB). The database currently contains an extended structural family for each of 330 representative protein chains. Each data set contains structural alignments of one search structure with all other structurally significantly similar proteins in the representative set (remote homologs, < 30% sequence identity), as well as all structures in the Protein Data Bank with 70-30% sequence identity relative to the search structure (medium homologs). Very close homologs (above 70% sequence identity) are excluded as they rarely have marked structural differences. The alignments of remote homologs are the result of pairwise all-against-all structural comparisons in the set of 330 representative protein chains. All such comparisons are based purely on the 3D co-ordinates of the proteins and are derived by automatic (objective) structure comparison programs. The significance of structural similarity is estimated based on statistical criteria. The FSSP database is available electronically from the EMBL file server and by anonymous ftp (file transfer protocol).  相似文献   

2.
We have updated the Protein Sequence-Structure Analysis Relational Database (PSSARD) first published in the Int. J. Biol. Macromol. 36 (2005) 259-262 corresponding to 1573 representative protein chains selected from the Protein Data Bank (PDB). In this, the updated and revised PSSARD (Version 2.0), we have included all proteins in the Protein Data Bank available at the time of developing this database including the NMR PDB entries. The current database corresponds to 22,752 XRAY PDB entries and 3977 NMR PDB entries and is separated accordingly in order to facilitate the appropriate database search. The representative protein chains can also be separately accessed within the current database. We have made a provision to combine more than one field to query the database and the results of any search can be used to carry out further nested searches using a combination of queries. We have provided hyperlinks to the individual PDB entries obtained as the result of any search in PSSARD in order to obtain additional details relevant to the protein structure. Certain applications useful to identify domains and structural motifs are discussed.  相似文献   

3.
The presence and location of intramolecular disulphide bonds are a key determinant of the structure and function of proteins. Intramolecular disulphide bonds in proteins have previously been analyzed under the assumption that there is no clear relationship between disulphide arrangement and disulphide concentration. To investigate this, a set of sequence nonhomologous protein chains containing one or more intramolecular disulphide bonds was extracted from the Protein Data Bank, and the arrangements of the bonds, Protein Data Bank header, and Structural Characterization of Proteins fold were analyzed as a function of intramolecular disulphide bond concentration. Two populations of intramolecular disulphide bond-containing proteins were identified, with a naturally occurring partition at 25 residues per bond. These populations were named intramolecular disulphide bond-rich and -poor. Benefits of partitioning were illustrated by three results: (1) rich chains most frequently contained three disulphides, explaining the plateaux in extant disulphide frequency distributions; (2) a positive relationship between median chain length and the number of disulphides, only seen when the data were partitioned; and (3) the most common bonding pattern for chains with three disulphide bonds was based on the most common for two, only when the data were partitioned. The two populations had different headers, folds, bond arrangements, and chain lengths. Associations between IDSB concentration, IDSB bonding pattern, loop sizes, SCOP fold, and PDB header were also found. From this, we found that intramolecular disulphide bond-rich and -poor proteins follow different bonding rules, and must be considered separately to generate meaningful models of bond formation.  相似文献   

4.
Selection of representative protein data sets.   总被引:37,自引:17,他引:20       下载免费PDF全文
The Protein Data Bank currently contains about 600 data sets of three-dimensional protein coordinates determined by X-ray crystallography or NMR. There is considerable redundancy in the data base, as many protein pairs are identical or very similar in sequence. However, statistical analyses of protein sequence-structure relations require nonredundant data. We have developed two algorithms to extract from the data base representative sets of protein chains with maximum coverage and minimum redundancy. The first algorithm focuses on optimizing a particular property of the selected proteins and works by successive selection of proteins from an ordered list and exclusion of all neighbors of each selected protein. The other algorithm aims at maximizing the size of the selected set and works by successive thinning out of clusters of similar proteins. Both algorithms are generally applicable to other data bases in which criteria of similarity can be defined and relate to problems in graph theory. The largest nonredundant set extracted from the current release of the Protein Data Bank has 155 protein chains. In this set, no two proteins have sequence similarity higher than a certain cutoff (30% identical residues for aligned subsequences longer than 80 residues), yet all structurally unique protein families are represented. Periodically updated lists of representative data sets are available by electronic mail from the file server "netserv@embl-heidelberg.de." The selection may be useful in statistical approaches to protein folding as well as in the analysis and documentation of the known spectrum of three-dimensional protein structures.  相似文献   

5.
MOTIVATION: Protein structure classification has been recognized as one of the most important research issues in protein structure analysis. A substantial number of methods for the classification have been proposed, and several databases have been constructed using these methods. Since some proteins with very similar sequences may exhibit structural diversities, we have proposed PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB), which strategy of selection is based not only on sequence similarity but also on structural similarity. Forty-eight representative sets whose similarity criteria were predetermined were made available over the World Wide Web (WWW). However, the sets were insufficient in number to satisfy users researching protein structures by various methods. RESULT: We have improved the system for PDB-REPRDB so that the user may obtain a quick selection of representative chains from PDB. The selection of representative chains can be dynamically configured according to the user's requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. This paper describes the method we use to classify chains and select the representatives in the system. We also describe the interface used to set the parameters.  相似文献   

6.
The analysis of disulphide bond containing proteins in the Protein Data Bank (PDB) revealed that out of 27,209 protein structures analyzed, 12,832 proteins contain at least one intra-chain disulphide bond and 811 proteins contain at least one inter-chain disulphide bond. The intra-chain disulphide bond containing proteins can be grouped into 256 categories based on the number of disulphide bonds and the disulphide bond connectivity patterns (DBCPs) that were generated according to the position of half-cystine residues along the protein chain. The PDB entries corresponding to these 256 categories represent 509 unique SCOP superfamilies. A simple web-based computational tool is made freely available at the website http://www.ccmb.res.in/bioinfo/dsbcp that allows flexible queries to be made on the database in order to retrieve useful information on the disulphide bond containing proteins in the PDB. The database is useful to identify the different SCOP superfamilies associated with a particular disulphide bond connectivity pattern or vice versa. It is possible to define a query based either on a single field or a combination of the following fields, i.e., PDB code, protein name, SCOP superfamily name, number of disulphide bonds, disulphide bond connectivity pattern and the number of amino acid residues in a protein chain and retrieve information that match the criterion. Thereby, the database may be useful to select suitable protein structural templates in order to model the more distantly related protein homologs/analogs using the comparative modeling methods.  相似文献   

7.
Oligomeric proteins are more abundant in nature than monomeric proteins, and involved in all biological processes. In the absence of an experimental structure, their subunits can be modeled from their sequence like monomeric proteins, but reliable procedures to build the oligomeric assembly are scarce. Template‐based methods, which start from known protein structures, are commonly applied to model subunits. We present a method to model homodimers that relies on a structural alignment of the subunits, and test it on a set of 511 target structures recently released by the Protein Data Bank, taking as templates the earlier released structures of 3108 homodimeric proteins (H‐set), and 2691 monomeric proteins that form dimer‐like assemblies in crystals (M‐set). The structural alignment identifies a H‐set template for 97% of the targets, and in half of the cases, it yields a correct model of the dimer geometry and residue–residue contacts in the target. It also identifies a M‐set template for most of the targets, and some of the crystal dimers are very similar to the target homodimers. The procedure efficiently detects homology at low levels of sequence identities, and points to erroneous quaternary structures in the Protein Data Bank. The high coverage of the target set suggests that the content of the Protein Data Bank already approaches the structural diversity of protein assemblies in nature, and that template‐based methods should become the choice method for modeling oligomeric as well as monomeric proteins.  相似文献   

8.
L Wernisch  M Hunting  S J Wodak 《Proteins》1999,35(3):338-352
A novel automatic procedure for identifying domains from protein atomic coordinates is presented. The procedure, termed STRUDL (STRUctural Domain Limits), does not take into account information on secondary structures and handles any number of domains made up of contiguous or non-contiguous chain segments. The core algorithm uses the Kernighan-Lin graph heuristic to partition the protein into residue sets which display minimum interactions between them. These interactions are deduced from the weighted Voronoi diagram. The generated partitions are accepted or rejected on the basis of optimized criteria, representing basic expected physical properties of structural domains. The graph heuristic approach is shown to be very effective, it approximates closely the exact solution provided by a branch and bound algorithm for a number of test proteins. In addition, the overall performance of STRUDL is assessed on a set of 787 representative proteins from the Protein Data Bank by comparison to domain definitions in the CATH protein classification. The domains assigned by STRUDL agree with the CATH assignments in at least 81% of the tested proteins. This result is comparable to that obtained previously using PUU (Holm and Sander, Proteins 1994;9:256-268), the only other available algorithm designed to identify domains with any number of non-contiguous chain segments. A detailed discussion of the structures for which our assignments differ from those in CATH brings to light some clear inconsistencies between the concept of structural domains based on minimizing inter-domain interactions and that of delimiting structural motifs that represent acceptable folding topologies or architectures. Considering both concepts as complementary and combining them in a layered approach might be the way forward.  相似文献   

9.
The function of a protein molecule is greatly influenced by its three-dimensional (3D) structure and therefore structure prediction will help identify its biological function. We have updated Sequence, Motif and Structure (SMS), the database of structurally rigid peptide fragments, by combining amino acid sequences and the corre-sponding 3D atomic coordinates of non-redundant (25%) and redundant (90%) protein chains available in the Protein Data Bank (PDB). SMS 2.0 provides information pertaining to the peptide fragments of length 5-14 resi-dues. The entire dataset is divided into three categories, namely, same sequence motifs having similar, intermedi-ate or dissimilar 3D structures. Further, options are provided to facilitate structural superposition using the pro-gram structural alignment of multiple proteins (STAMP) and the popular JAVA plug-in (Jmol) is deployed for visualization. In addition, functionalities are provided to search for the occurrences of the sequence motifs in other structural and sequence databases like PDB, Genome Database (GDB), Protein Information Resource (PIR) and Swiss-Prot. The updated database along with the search engine is available over the World Wide Web through the following URL http://cluster.physics.iisc.ernet.in/sms/.  相似文献   

10.
11.
The Protein Data Bank is a computer-based archival file for macromolecular structures. The Bank stores in a uniform format atomic co-ordinates and partial bond connectivities, as derived from crystallographic studies. Text included in each data entry gives pertinent information for the structure at hand (e.g. species from which the molecule has been obtained, resolution of diffraction data, literature citations and specifications of secondary structure). In addition to atomic co-ordinates and connectivities, the Protein Data Bank stores structure factors and phases, although these latter data are not placed in any uniform format. Input of data to the Bank and general maintenance functions are carried out at Brookhaven National Laboratory. All data stored in the Bank are available on magnetic tape for public distribution, from Brookhaven (to laboratories in the Americas), Tokyo (Japan), and Cambridge (Europe and worldwide). A master file is maintained at Brookhaven and duplicate copies are stored in Cambridge and Tokyo. In the future, it is hoped to expand the scope of the Protein Data Bank to make available co-ordinates for standard structural types (e.g. α-helix, RNA double-stranded helix) and representative computer programs of utility in the study and interpretation of macromolecular structures.  相似文献   

12.
The targets of the Structural GenomiX (SGX) bacterial genomics project were proteins conserved in multiple prokaryotic organisms with no obvious sequence homolog in the Protein Data Bank of known structures. The outcome of this work was 80 structures, covering 60 unique sequences and 49 different genes. Experimental phase determination from proteins incorporating Se-Met was carried out for 45 structures with most of the remainder solved by molecular replacement using members of the experimentally phased set as search models. An automated tool was developed to deposit these structures in the Protein Data Bank, along with the associated X-ray diffraction data (including refined experimental phases) and experimentally confirmed sequences. BLAST comparisons of the SGX structures with structures that had appeared in the Protein Data Bank over the intervening 3.5 years since the SGX target list had been compiled identified homologs for 49 of the 60 unique sequences represented by the SGX structures. This result indicates that, for bacterial structures that are relatively easy to express, purify, and crystallize, the structural coverage of gene space is proceeding rapidly. More distant sequence-structure relationships between the SGX and PDB structures were investigated using PDB-BLAST and Combinatorial Extension (CE). Only one structure, SufD, has a truly unique topology compared to all folds in the PDB.  相似文献   

13.
Families and the structural relatedness among globular proteins.   总被引:4,自引:3,他引:1       下载免费PDF全文
Protein structures come in families. Are families “closely knit” or “loosely knit” entities? We describe a measure of relatedness among polymer conformations. Based on weighted distance maps, this measure differs from existing measures mainly in two respects: (1) it is computationally fast, and (2) it can compare any two proteins, regardless of their relative chain lengths or degree of similarity. It does not require finding relative alignments. The measure is used here to determine the dissimilarities between all 12, 403 possible pairs of 158 diverse protein structures from the Brookhaven Protein Data Bank (PDB). Combined with minimal spanning trees and hierarchical clustering methods, this measure is used to define structural families. It is also useful for rapidly searching a dataset of protein structures for specific substructural motifs. By using an analogy to distributions of Euclidean distances, we find that protein families are not tightly knit entities.  相似文献   

14.
Most of the proteins in a cell assemble into complexes to carry out their function. It is therefore crucial to understand the physicochemical properties as well as the evolution of interactions between proteins. The Protein Data Bank represents an important source of information for such studies, because more than half of the structures are homo- or heteromeric protein complexes. Here we propose the first hierarchical classification of whole protein complexes of known 3-D structure, based on representing their fundamental structural features as a graph. This classification provides the first overview of all the complexes in the Protein Data Bank and allows nonredundant sets to be derived at different levels of detail. This reveals that between one-half and two-thirds of known structures are multimeric, depending on the level of redundancy accepted. We also analyse the structures in terms of the topological arrangement of their subunits and find that they form a small number of arrangements compared with all theoretically possible ones. This is because most complexes contain four subunits or less, and the large majority are homomeric. In addition, there is a strong tendency for symmetry in complexes, even for heteromeric complexes. Finally, through comparison of Biological Units in the Protein Data Bank with the Protein Quaternary Structure database, we identified many possible errors in quaternary structure assignments. Our classification, available as a database and Web server at http://www.3Dcomplex.org, will be a starting point for future work aimed at understanding the structure and evolution of protein complexes.  相似文献   

15.
Nucleoside triphosphate (NTP) ligands are of high biological importance and are essential for all life forms. A pre‐requisite for them to participate in diverse biochemical processes is their recognition by diverse proteins. It is thus of great interest to understand the basis for such recognition in different proteins. Towards this, we have used a structural bioinformatics approach and analyze structures of 4677 NTP complexes available in Protein Data Bank (PDB). Binding sites were extracted and compared exhaustively using PocketMatch, a sensitive in‐house site comparison algorithm, which resulted in grouping the entire dataset into 27 site‐types. Each of these site‐types represent a structural motif comprised of two or more residue conservations, derived using another in‐house tool for superposing binding sites, PocketAlign. The 27 site‐types could be grouped further into 9 super‐types by considering partial similarities in the sites, which indicated that the individual site‐types comprise different combinations of one or more site features. A scan across PDB using the 27 structural motifs determined the motifs to be specific to NTP binding sites, and a computational alanine mutagenesis indicated that residues identified to be highly conserved in the motifs are also most contributing to binding. Alternate orientations of the ligand in several site‐types were observed and rationalized, indicating the possibility of some residues serving as anchors for NTP recognition. The presence of multiple site‐types and the grouping of multiple folds into each site‐type is strongly suggestive of convergent evolution. Knowledge of determinants obtained from this study will be useful for detecting function in unknown proteins. Proteins 2017; 85:1699–1712. © 2017 Wiley Periodicals, Inc.  相似文献   

16.
Enlarged representative set of protein structures.   总被引:30,自引:13,他引:17       下载免费PDF全文
To reduce redundancy in the Protein Data Bank of 3D protein structures, which is caused by many homologous proteins in the data bank, we have selected a representative set of structures. The selection algorithm was designed to (1) select as many nonhomologous structures as possible, and (2) to select structures of good quality. The representative set may reduce time and effort in statistical analyses.  相似文献   

17.
Alpha-helices stand out as common and relatively invariant secondary structural elements of proteins. However, alpha-helices are not rigid bodies and their deformations can be significant in protein function (e.g. coiled coils). To quantify the flexibility of alpha-helices we have performed a structural principal-component analysis of helices of different lengths from a representative set of protein folds in the Protein Data Bank. We find three dominant modes of flexibility: two degenerate bend modes and one twist mode. The data are consistent with independent Gaussian distributions for each mode. The mode eigenvalues, which measure flexibility, follow simple scaling forms as a function of helix length. The dominant bend and twist modes and their harmonics are reproduced by a simple spring model, which incorporates hydrogen-bonding and excluded volume. As an application, we examine the amount of bend and twist in helices making up all coiled-coil proteins in SCOP. Incorporation of alpha-helix flexibility into structure refinement and design is discussed.  相似文献   

18.
PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). The previous version of PDB-REPRDB provided 48 representative sets, whose similarity criteria were predetermined, on the WWW. The current version is designed so that the user may obtain a quick selection of representative chains from PDB. The selection of representative chains can be dynamically configured according to the user's requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. One can obtain a representative list and classification data of protein chains from the system. The current database includes 20 457 protein chains from PDB entries (August 6, 2000). The system for PDB-REPRDB is available at the Parallel Protein Information Analysis system (PAPIA) WWW server (http://www.rwcp.or.jp/papia/).  相似文献   

19.
Structure motif discovery and mining the PDB   总被引:2,自引:0,他引:2  
MOTIVATION: Many of the most interesting functional and evolutionary relationships among proteins are so ancient that they cannot be reliably detected through sequence analysis and are apparent only through a comparison of the tertiary structures. The conserved features can often be described as structural motifs consisting of a few single residues or Secondary Structure (SS) elements. Confidence in such motifs is greatly boosted when they are found in more than a pair of proteins. RESULTS: We describe an algorithm for the automatic discovery of recurring patterns in protein structures. The patterns consist of individual residues having a defined order along the protein's backbone that come close together in the structure and whose spatial conformations are similar. The residues in a pattern need not be close in the protein's sequence. The work described in this paper builds on an earlier reported algorithm for motif discovery. This paper describes a significant improvement of the algorithm which makes it very efficient. The improved efficiency allows us to use it for doing unsupervised learning of patterns occurring in small subsets in a large set of structures, a non-redundant subset of the Protein Data Bank (PDB) database of all known protein structures.  相似文献   

20.
We report the observation of continuous turns in proteins which comprise individual gamma-turns or beta-turns or both that are situated immediately one after the other along the polypeptide chain. The continuous turns were identified from a representative data set of three-dimensional protein crystal structures. The gammabeta/betagamma, gammagamma and betabeta continuous turns represent peptides of varying amino acid residue lengths and conformations. The continuous turns frequently observed in proteins were: gammabeta, between a coil and a strand; betagamma, between a helix and a strand; gammagamma, between coils; and betabeta, either between a strand and a coil or between strands or coils. We determined the statistically significant amino acid residue preferences at individual positions in the turn, calculated amino acid positional potentials and analyzed main chain hydrogen bonds and side-chain interactions likely to stabilize the continuous turns. The data on continuous turns have been integrated in the database of structural motifs in proteins (DSMP) on our web server at (http://www.cdfd.org.in/dsmp.html). This is useful to make queries on sequences compatible with different continuous turns.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号