首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Structural bioinformatics of membrane proteins is still in its infancy, and the picture of their fold space is only beginning to emerge. Because only a handful of three-dimensional structures are available, sequence comparison and structure prediction remain the main tools for investigating sequence-structure relationships in membrane protein families. Here we present a comprehensive analysis of the structural families corresponding to α-helical membrane proteins with at least three transmembrane helices. The new version of our CAMPS database (CAMPS 2.0) covers nearly 1300 eukaryotic, prokaryotic, and viral genomes. Using an advanced classification procedure, which is based on high-order hidden Markov models and considers both sequence similarity as well as the number of transmembrane helices and loop lengths, we identified 1353 structurally homogeneous clusters roughly corresponding to membrane protein folds. Only 53 clusters are associated with experimentally determined three-dimensional structures, and for these clusters CAMPS is in reasonable agreement with structure-based classification approaches such as SCOP and CATH. We therefore estimate that ~1300 structures would need to be determined to provide a sufficient structural coverage of polytopic membrane proteins. CAMPS 2.0 is available at http://webclu.bio.wzw.tum.de/CAMPS2.0/.  相似文献   

2.
A rapid method of protein structure alignment   总被引:5,自引:0,他引:5  
A reduction in the time required to compare two protein structures has been achieved for a previously developed structure alignment method, by reducing the number of residue pair comparisons which must be performed between the two structures. Subsets of residue pairs are selected by an iterative procedure. Initially, selection is based on similarities in solvent accessible surface areas or torsional angles or a combination of both properties, giving subsets containing approximately 2% of the total number of residue pairs. Using these subsets, a rough comparison of the two structures is generated by the structural alignment program. The information returned from this can be used to identify more accurately topologically equivalent residues in the two proteins, thus enabling a new and much smaller subset (less than 0.2% of the total number of residue pairs) to be selected. The process of iterative refinement of the residue pair subsets is repeated once more, when in 95% of the structure comparisons tested, the correct alignment of the proteins was obtained. Times required to compare the structures using the refined subsets are insignificant compared to the initial comparison, so that considerable increases in speed are possible. The method was tested on two groups of proteins, a set of remotely related alpha/beta nucleotide proteins and the variable and constant domains of the immunoglobulins. Increases in speed ranging from 50-fold to greater than 150-fold were obtained depending on the degree of similarity of the two structures. In some comparisons the alignment was improved due to the reduction in noise obtained by comparing mainly equivalent residues.  相似文献   

3.
G Vriend  C Sander 《Proteins》1991,11(1):52-58
We present a fully automatic algorithm for three-dimensional alignment of protein structures and for the detection of common substructures and structural repeats. Given two proteins, the algorithm first identifies all pairs of structurally similar fragments and subsequently clusters into larger units pairs of fragments that are compatible in three dimensions. The detection of similar substructures is independent of insertion/deletion penalties and can be chosen to be independent of the topology of loop connections and to allow for reversal of chain direction. Using distance geometry filters and other approximations, the algorithm, implemented in the WHAT IF program, is so fast that structural comparison of a single protein with the entire database of known protein structures can be performed routinely on a workstation. The method reproduces known non-trivial superpositions such as plastocyanin on azurin. In addition, we report surprising structural similarity between ubiquitin and a (2Fe-2S) ferredoxin.  相似文献   

4.
Newly determined protein structures are classified to belong to a new fold, if the structures are sufficiently dissimilar from all other so far known protein structures. To analyze structural similarities of proteins, structure alignment tools are used. We demonstrate that the usage of nonsequential structure alignment tools, which neglect the polypeptide chain connectivity, can yield structure alignments with significant similarities between proteins of known three-dimensional structure and newly determined protein structures that possess a new fold. The recently introduced protein structure alignment tool, GANGSTA, is specialized to perform nonsequential alignments with proper assignment of the secondary structure types by focusing on helices and strands only. In the new version, GANGSTA+, the underlying algorithms were completely redesigned, yielding enhanced quality of structure alignments, offering alignment against a larger database of protein structures, and being more efficient. We applied DaliLite, TM-align, and GANGSTA+ on three protein crystal structures considered to be novel folds. Applying GANGSTA+ to these novel folds, we find proteins in the ASTRAL40 database, which possess significant structural similarities, albeit the alignments are nonsequential and in some cases involve secondary structure elements aligned in reverse orientation. A web server is available at http://agknapp.chemie.fu-berlin.de/gplus for pairwise alignment, visualization, and database comparison.  相似文献   

5.
The degree of similarity of two protein three-dimensional structures is usually measured with the root-mean-square distance between equivalent atom pairs. Such a similarity measure depends on the dimension of the proteins, that is, on the number of equivalent atom pairs. The present communication presents a simple procedure to make the root-mean-square distances between pairs of three-dimensional structures independent of their dimensions. This normalization may be useful in evolutionary and fold classification studies as well as in simple comparisons between different structural models.  相似文献   

6.
The information required to generate a protein structure is contained in its amino acid sequence, but how three-dimensional information is mapped onto a linear sequence is still incompletely understood. Multiple structure alignments of similar protein structures have been used to investigate conserved sequence features but contradictory results have been obtained, due, in large part, to the absence of subjective criteria to be used in the construction of sequence profiles and in the quantitative comparison of alignment results. Here, we report a new procedure for multiple structure alignment and use it to construct structure-based sequence profiles for similar proteins. The definition of "similar" is based on the structural alignment procedure and on the protein structural distance (PSD) described in paper I of this series, which offers an objective measure for protein structure relationships. Our approach is tested in two well-studied groups of proteins; serine proteases and Ig-like proteins. It is demonstrated that the quality of a sequence profile generated by a multiple structure alignment is quite sensitive to the PSD used as a threshold for the inclusion of proteins in the alignment. Specifically, if the proteins included in the aligned set are too distant in structure from one another, there will be a dilution of information and patterns that are relevant to a subset of the proteins are likely to be lost.In order to understand better how the same three-dimensional information can be encoded in seemingly unrelated sequences, structure-based sequence profiles are constructed for subsets of proteins belonging to nine superfolds. We identify patterns of relatively conserved residues in each subset of proteins. It is demonstrated that the most conserved residues are generally located in the regions where tertiary interactions occur and that are relatively conserved in structure. Nevertheless, the conservation patterns are relatively weak in all cases studied, indicating that structure-determining factors that do not require a particular sequential arrangement of amino acids, such as secondary structure propensities and hydrophobic interactions, are important in encoding protein fold information. In general, we find that similar structures can fold without having a set of highly conserved residue clusters or a well-conserved sequence profile; indeed, in some cases there is no apparent conservation pattern common to structures with the same fold. Thus, when a group of proteins exhibits a common and well-defined sequence pattern, it is more likely that these sequences have a close evolutionary relationship rather than the similarities having arisen from the structural requirements of a given fold.  相似文献   

7.
Advances in structural genomics and protein structure prediction require the design of automatic, fast, objective, and well benchmarked methods capable of comparing and assessing the similarity of low-resolution three-dimensional structures, via experimental or theoretical approaches. Here, a new method for sequence-independent structural alignment is presented that allows comparison of an experimental protein structure with an arbitrary low-resolution protein tertiary model. The heuristic algorithm is given and then used to show that it can describe random structural alignments of proteins with different folds with good accuracy by an extreme value distribution. From this observation, a structural similarity score between two proteins or two different conformations of the same protein is derived from the likelihood of obtaining a given structural alignment by chance. The performance of the derived score is then compared with well established, consensus manual-based scores and data sets. We found that the new approach correlates better than other tools with the gold standard provided by a human evaluator. Timings indicate that the algorithm is fast enough for routine use with large databases of protein models. Overall, our results indicate that the new program (MAMMOTH) will be a good tool for protein structure comparisons in structural genomics applications. MAMMOTH is available from our web site at http://physbio.mssm.edu/~ortizg/.  相似文献   

8.
In the era of structural genomics, it is necessary to generate accurate structural alignments in order to build good templates for homology modeling. Although a great number of structural alignment algorithms have been developed, most of them ignore intermolecular interactions during the alignment procedure. Therefore, structures in different oligomeric states are barely distinguishable, and it is very challenging to find correct alignment in coil regions. Here we present a novel approach to structural alignment using a clique finding algorithm and environmental information (SAUCE). In this approach, we build the alignment based on not only structural coordinate information but also realistic environmental information extracted from biological unit files provided by the Protein Data Bank (PDB). At first, we eliminate all environmentally unfavorable pairings of residues. Then we identify alignments in core regions via a maximal clique finding algorithm. Two extreme value distribution (EVD) form statistics have been developed to evaluate core region alignments. With an optional extension step, global alignment can be derived based on environment-based dynamic programming linking. We show that our method is able to differentiate three-dimensional structures in different oligomeric states, and is able to find flexible alignments between multidomain structures without predetermined hinge regions. The overall performance is also evaluated on a large scale by comparisons to current structural classification databases as well as to other alignment methods.  相似文献   

9.
C A Orengo  N P Brown  W R Taylor 《Proteins》1992,14(2):139-167
A fast method is described for searching and analyzing the protein structure databank. It uses secondary structure followed by residue matching to compare protein structures and is developed from a previous structural alignment method based on dynamic programming. Linear representations of secondary structures are derived and their features compared to identify equivalent elements in two proteins. The secondary structure alignment then constrains the residue alignment, which compares only residues within aligned secondary structures and with similar buried areas and torsional angles. The initial secondary structure alignment improves accuracy and provides a means of filtering out unrelated proteins before the slower residue alignment stage. It is possible to search or sort the protein structure databank very quickly using just secondary structure comparisons. A search through 720 structures with a probe protein of 10 secondary structures required 1.7 CPU hours on a Sun 4/280. Alternatively, combined secondary structure and residue alignments, with a cutoff on the secondary structure score to remove pairs of unrelated proteins from further analysis, took 10.1 CPU hours. The method was applied in searches on different classes of proteins and to cluster a subset of the databank into structurally related groups. Relationships were consistent with known families of protein structure.  相似文献   

10.
11.
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.  相似文献   

12.
13.
Multiple flexible structure alignment using partial order graphs   总被引:2,自引:0,他引:2  
MOTIVATION: Existing comparisons of protein structures are not able to describe structural divergence and flexibility in the structures being compared because they focus on identifying a common invariant core and ignore parts of the structures outside this core. Understanding the structural divergence and flexibility is critical for studying the evolution of functions and specificities of proteins. RESULTS: A new method of multiple protein structure alignment, POSA (Partial Order Structure Alignment), was developed using a partial order graph representation of multiple alignments. POSA has two unique features: (1) identifies and classifies regions that are conserved only in a subset of input structures and (2) allows internal rearrangements in protein structures. POSA outperforms other programs in the cases where structural flexibilities exist and provides new insights by visualizing the mosaic nature of multiple structural alignments. POSA is an ideal tool for studying the variation of protein structures within diverse structural families. AVAILABILITY: POSA is freely available for academic users on a Web server at http://fatcat.burnham.org/POSA  相似文献   

14.

Background  

Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA.  相似文献   

15.
The three-dimensional structures of leucine-rich repeat (LRR)-containing proteins from five different families were previously predicted based on the crystal structure of the ribonuclease inhibitor, using an approach that combined homology-based modeling, structure-based sequence alignment of LRRs, and several rational assumptions. The structural models have been produced based on very limited sequence similarity, which, in general, cannot yield trustworthy predictions. Recently, the protein structures from three of these five families have been determined. In this report we estimate the quality of the modeling approach by comparing the models with the experimentally determined structures. The comparison suggests that the general architecture, curvature, "interior/exterior" orientations of side chains, and backbone conformation of the LRR structures can be predicted correctly. On the other hand, the analysis revealed that, in some cases, it is difficult to predict correctly the twist of the overall super-helical structure. Taking into consideration the conclusions from these comparisons, we identified a new family of bacterial LRR proteins and present its structural model. The reliability of the LRR protein modeling suggests that it would be informative to apply similar modeling approaches to other classes of solenoid proteins.  相似文献   

16.
C Sander  R Schneider 《Proteins》1991,9(1):56-68
The database of known protein three-dimensional structures can be significantly increased by the use of sequence homology, based on the following observations. (1) The database of known sequences, currently at more than 12,000 proteins, is two orders of magnitude larger than the database of known structures. (2) The currently most powerful method of predicting protein structures is model building by homology. (3) Structural homology can be inferred from the level of sequence similarity. (4) The threshold of sequence similarity sufficient for structural homology depends strongly on the length of the alignment. Here, we first quantify the relation between sequence similarity, structure similarity, and alignment length by an exhaustive survey of alignments between proteins of known structure and report a homology threshold curve as a function of alignment length. We then produce a database of homology-derived secondary structure of proteins (HSSP) by aligning to each protein of known structure all sequences deemed homologous on the basis of the threshold curve. For each known protein structure, the derived database contains the aligned sequences, secondary structure, sequence variability, and sequence profile. Tertiary structures of the aligned sequences are implied, but not modeled explicitly. The database effectively increases the number of known protein structures by a factor of five to more than 1800. The results may be useful in assessing the structural significance of matches in sequence database searches, in deriving preferences and patterns for structure prediction, in elucidating the structural role of conserved residues, and in modeling three-dimensional detail by homology.  相似文献   

17.
Structures of peptide fragments drawn from a protein can potentially occupy a vast conformational continuum. We co-ordinatize this conformational space with the help of geometric invariants and demonstrate that the peptide conformations of the currently available protein structures are heavily biased in favor of a finite number of conformational types or structural building blocks. This is achieved by representing a peptides' backbone structure with geometric invariants and then clustering peptides based on closeness of the geometric invariants. This results in 12,903 clusters, of which 2207 are made up of peptides drawn from functionally and/or structurally related proteins. These are termed "functional" clusters and provide clues about potential functional sites. The rest of the clusters, including the largest few, are made up of peptides drawn from unrelated proteins and are termed "structural" clusters. The largest clusters are of regular secondary structures such as helices and beta strands as well as of beta hairpins. Several categories of helices and strands are discovered based on geometric differences. In addition to the known classes of loops, we discover several new classes, which will be useful in protein structure modeling. Our algorithm does not require assignment of secondary structure and, therefore, overcomes the limitations in loop classification due to ambiguity in secondary structure assignment at loop boundaries.  相似文献   

18.
Similarity of protein structures has been analyzed using three-dimensional Delaunay triangulation patterns derived from the backbone representation. It has been found that structurally related proteins have a common spatial invariant part, a set of tetrahedrons, mathematically described as a common spatial subgraph volume of the three-dimensional contact graph derived from Delaunay tessellation (DT). Based on this property of protein structures, we present a novel common volume superimposition (TOPOFIT) method to produce structural alignments. Structural alignments usually evaluated by a number of equivalent (aligned) positions (N(e)) with corresponding root mean square deviation (RMSD). The superimposition of the DT patterns allows one to uniquely identify a maximal common number of equivalent residues in the structural alignment. In other words, TOPOFIT identifies a feature point on the RMSD N(e) curve, a topomax point, until which the topologies of two structures correspond to each other, including backbone and interresidue contacts, whereas the growing number of mismatches between the DT patterns occurs at larger RMSD (N(e)) after the topomax point. It has been found that the topomax point is present in all alignments from different protein structural classes; therefore, the TOPOFIT method identifies common, invariant structural parts between proteins. The alignments produced by the TOPOFIT method have a good correlation with alignments produced by other current methods. This novel method opens new opportunities for the comparative analysis of protein structures and for more detailed studies on understanding the molecular principles of tertiary structure organization and functionality. The TOPOFIT method also helps to detect conformational changes, topological differences in variable parts, which are particularly important for studies of variations in active/ binding sites and protein classification.  相似文献   

19.
Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular, > 100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.  相似文献   

20.
We describe the results of a procedure for maximizing the number of sequences that can be reliably linked to a protein of known three-dimensional structure. Unlike other methods, which try to increase sensitivity through the use of fold recognition software, we only use conventional sequence alignment tools, but apply them in a manner that significantly increases the number of relationships detected. We analyzed 11 genomes and found that, depending on the genome, between 23 and 32% of the ORFs had significant matches to proteins of known structure. In all cases, the aligned region consisted of either >100 residues or >50% of the smaller sequence. Slightly higher percentages could be attained if smaller motifs were also included. This is significantly higher than most previously reported methods, even those that have a fold-recognition component. We survey the biochemical and structural characteristics of the most frequently occurring proteins, and discuss the extent to which alignment methods can realistically assign function to gene products.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号