首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
J Boberg  T Salakoski  M Vihinen 《Proteins》1992,14(2):265-276
Reliable structural and statistical analyses of three dimensional protein structures should be based on unbiased data. The Protein Data Bank is highly redundant, containing several entries for identical or very similar sequences. A technique was developed for clustering the known structures based on their sequences and contents of alpha- and beta-structures. First, sequences were aligned pairwise. A representative sample of sequences was then obtained by grouping similar sequences together, and selecting a typical representative from each group. The similarity significance threshold needed in the clustering method was found by analyzing similarities of random sequences. Because three dimensional structures for proteins of same structural class are generally more conserved than their sequences, the proteins were clustered also according to their contents of secondary structural elements. The results of these clusterings indicate conservation of alpha- and beta-structures even when sequence similarity is relatively low. An unbiased sample of 103 high resolution structures, representing a wide variety of proteins, was chosen based on the suggestions made by the clustering algorithm. The proteins were divided into structural classes according to their contents and ratios of secondary structural elements. Previous classifications have suffered from subjective view of secondary structures, whereas here the classification was based on backbone geometry. The concise view lead to reclassification of some structures. The representative set of structures facilitates unbiased analyses of relationships between protein sequence, function, and structure as well as of structural characteristics.  相似文献   

2.
Structural classification of membrane proteins is still in its infancy due to the relative paucity of available three‐dimensional structures compared with soluble proteins. However, recent technological advances in protein structure determination have led to a significant increase in experimentally known membrane protein folds, warranting exploration of the structural universe of membrane proteins. Here, a new and completely membrane protein specific structural classification system is introduced that classifies α‐helical membrane proteins according to common helix architectures. Each membrane protein is represented by a helix interaction graph depicting transmembrane helices with their pairwise interactions resulting from individual residue contacts. Subsequently, proteins are clustered according to similarities among these helix interaction graphs using a newly developed structural similarity score called HISS. As HISS scores explicitly disregard structural properties of loop regions, they are more suitable to capture conserved transmembrane helix bundle architectures than other structural similarity scores. Importantly, we are able to show that a classification approach based on helix interaction similarity closely resembles conventional structural classification databases such as SCOP and CATH implying that helix interactions are one of the major determinants of α‐helical membrane protein folds. Furthermore, the classification of all currently available membrane protein structures into 20 recurrent helix architectures and 15 singleton proteins demonstrates not only an impressive variability of membrane helix bundles but also the conservation of common helix interaction patterns among proteins with distinctly different sequences. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

3.
The database PALI (Phylogeny and ALIgnment of homologous protein structures) consists of families of protein domains of known three-dimensional (3D) structure. In a PALI family, every member has been structurally aligned with every other member (pairwise) and also simultaneous superposition (multiple) of all the members has been performed. The database also contains 3D structure-based and structure-dependent sequence similarity-based phylogenetic dendrograms for all the families. The PALI release used in the present analysis comprises 225 families derived largely from the HOMSTRAD and SCOP databases. The quality of the multiple rigid-body structural alignments in PALI was compared with that obtained from COMPARER, which encodes a procedure based on properties and relationships. The alignments from the two procedures agreed very well and variations are seen only in the low sequence similarity cases often in the loop regions. A validation of Direct Pairwise Alignment (DPA) between two proteins is provided by comparing it with Pairwise alignment extracted from Multiple Alignment of all the members in the family (PMA). In general, DPA and PMA are found to vary rarely. The ready availability of pairwise alignments allows the analysis of variations in structural distances as a function of sequence similarities and number of topologically equivalent Calpha atoms. The structural distance metric used in the analysis combines root mean square deviation (r.m.s.d.) and number of equivalences, and is shown to vary similarly to r.m.s.d. The correlation between sequence similarity and structural similarity is poor in pairs with low sequence similarities. A comparison of sequence and 3D structure-based phylogenies for all the families suggests that only a few families have a radical difference in the two kinds of dendrograms. The difference could occur when the sequence similarity among the homologues is low or when the structures are subjected to evolutionary pressure for the retention of function. The PALI database is expected to be useful in furthering our understanding of the relationship between sequences and structures of homologous proteins and their evolution.  相似文献   

4.
The recent accumulation of large amounts of 3D structural data warrants a sensitive and automatic method to compare and classify these structures. We developed a web server for comparing protein 3D structures using the program Matras (http://biunit.aist-nara.ac.jp/matras). An advantage of Matras is its structure similarity score, which is defined as the log-odds of the probabilities, similar to Dayhoff's substitution model of amino acids. This score is designed to detect evolutionarily related (homologous) structural similarities. Our web server has three main services. The first one is a pairwise 3D alignment, which is simply align two structures. A user can assign structures by either inputting PDB codes or by uploading PDB format files in the local machine. The second service is a multiple 3D alignment, which compares several protein structures. This program employs the progressive alignment algorithm, in which pairwise 3D alignments are assembled in the proper order. The third service is a 3D library search, which compares one query structure against a large number of library structures. We hope this server provides useful tools for insights into protein 3D structures.  相似文献   

5.
Vorolign, a fast and flexible structural alignment method for two or more protein structures is introduced. The method aligns protein structures using double dynamic programming and measures the similarity of two residues based on the evolutionary conservation of their corresponding Voronoi-contacts in the protein structure. This similarity function allows aligning protein structures even in cases where structural flexibilities exist. Multiple structural alignments are generated from a set of pairwise alignments using a consistency-based, progressive multiple alignment strategy. RESULTS: The performance of Vorolign is evaluated for different applications of protein structure comparison, including automatic family detection as well as pairwise and multiple structure alignment. Vorolign accurately detects the correct family, superfamily or fold of a protein with respect to the SCOP classification on a set of difficult target structures. A scan against a database of >4000 proteins takes on average 1 min per target. The performance of Vorolign in calculating pairwise and multiple alignments is found to be comparable with other pairwise and multiple protein structure alignment methods. AVAILABILITY: Vorolign is freely available for academic users as a web server at http://www.bio.ifi.lmu.de/Vorolign  相似文献   

6.
Zhu J  Weng Z 《Proteins》2005,58(3):618-627
We present a novel algorithm named FAST for aligning protein three-dimensional structures. FAST uses a directionality-based scoring scheme to compare the intra-molecular residue-residue relationships in two structures. It employs an elimination heuristic to promote sparseness in the residue-pair graph and facilitate the detection of the global optimum. In order to test the overall accuracy of FAST, we determined its sensitivity and specificity with the SCOP classification (version 1.61) as the gold standard. FAST achieved higher sensitivities than several existing methods (DaliLite, CE, and K2) at all specificity levels. We also tested FAST against 1033 manually curated alignments in the HOMSTRAD database. The overall agreement was 96%. Close inspection of examples from broad structural classes indicated the high quality of FAST alignments. Moreover, FAST is an order of magnitude faster than other algorithms that attempt to establish residue-residue correspondence. Typical pairwise alignments take FAST less than a second with a Pentium III 1.2GHz CPU. FAST software and a web server are available at http://biowulf.bu.edu/FAST/.  相似文献   

7.
Newly determined protein structures are classified to belong to a new fold, if the structures are sufficiently dissimilar from all other so far known protein structures. To analyze structural similarities of proteins, structure alignment tools are used. We demonstrate that the usage of nonsequential structure alignment tools, which neglect the polypeptide chain connectivity, can yield structure alignments with significant similarities between proteins of known three-dimensional structure and newly determined protein structures that possess a new fold. The recently introduced protein structure alignment tool, GANGSTA, is specialized to perform nonsequential alignments with proper assignment of the secondary structure types by focusing on helices and strands only. In the new version, GANGSTA+, the underlying algorithms were completely redesigned, yielding enhanced quality of structure alignments, offering alignment against a larger database of protein structures, and being more efficient. We applied DaliLite, TM-align, and GANGSTA+ on three protein crystal structures considered to be novel folds. Applying GANGSTA+ to these novel folds, we find proteins in the ASTRAL40 database, which possess significant structural similarities, albeit the alignments are nonsequential and in some cases involve secondary structure elements aligned in reverse orientation. A web server is available at http://agknapp.chemie.fu-berlin.de/gplus for pairwise alignment, visualization, and database comparison.  相似文献   

8.
ABSTRACT: BACKGROUND: Searching for structural motifs across known protein structures can be useful for identifying unrelated proteins with similar function and characterising secondary structures such as beta-sheets. This is infeasible using conventional sequence alignment because linear protein sequences do not contain spatial information. beta-residue motifs are beta-sheet substructures that can be represented as graphs and queried using existing graph indexing methods, however, these approaches are designed for general graphs that do not incorporate the inherent structural constraints of beta-sheets and require computationally-expensive filtering and verification procedures. 3D substructure search methods, on the other hand, allow beta-residue motifs to be queried in a three-dimensional context but at significant computational costs. RESULTS: We developed a new method for querying beta-residue motifs, called BetaSearch, which leverages the natural planar constraints of beta-sheets by indexing them as 2D matrices, thus avoiding much of the computational complexities involved with structural and graph querying. BetaSearch demonstrates faster filtering, verification, and overall query time than existing graph indexing approaches whilst producing comparable index sizes. Compared to 3D substructure search methods, BetaSearch achieves 33 and 240 times speedups over index-based and pairwise alignment-based approaches, respectively. Furthermore, we have presented case-studies to demonstrate its capability of motif matching in sequentially dissimilar proteins and described a method for using BetaSearch to predict beta-strand pairing. CONCLUSIONS: We have demonstrated that BetaSearch is a fast method for querying substructure motifs. The improvements in speed over existing approaches make it useful for efficiently performing high-volume exploratory querying of possible protein substructural motifs or conformations. BetaSearch was used to identify a nearly identical beta-residue motif between an entirely synthetic (Top7) and a naturally-occurring protein (Charcot-Leyden crystal protein), as well as identifying structural similarities between biotin-binding domains of avidin, streptavidin and the lipocalin gamma subunit of human C8. AVAILABILITY: The web-interface, source code, and datasets for BetaSearch can be accessed from http://www.csse.unimelb.edu.au/~hohkhkh1/betasearch.  相似文献   

9.
Three-dimensional cluster analysis offers a method for the prediction of functional residue clusters in proteins. This method requires a representative structure and a multiple sequence alignment as input data. Individual residues are represented in terms of regional alignments that reflect both their structural environment and their evolutionary variation, as defined by the alignment of homologous sequences. From the overall (global) and the residue-specific (regional) alignments, we calculate the global and regional similarity matrices, containing scores for all pairwise sequence comparisons in the respective alignments. Comparing the matrices yields two scores for each residue. The regional conservation score (C(R)(x)) defines the conservation of each residue x and its neighbors in 3D space relative to the protein as a whole. The similarity deviation score (S(x)) detects residue clusters with sequence similarities that deviate from the similarities suggested by the full-length sequences. We evaluated 3D cluster analysis on a set of 35 families of proteins with available cocrystal structures, showing small ligand interfaces, nucleic acid interfaces and two types of protein-protein interfaces (transient and stable). We present two examples in detail: fructose-1,6-bisphosphate aldolase and the mitogen-activated protein kinase ERK2. We found that the regional conservation score (C(R)(x)) identifies functional residue clusters better than a scoring scheme that does not take 3D information into account. C(R)(x) is particularly useful for the prediction of poorly conserved, transient protein-protein interfaces. Many of the proteins studied contained residue clusters with elevated similarity deviation scores. These residue clusters correlate with specificity-conferring regions: 3D cluster analysis therefore represents an easily applied method for the prediction of functionally relevant spatial clusters of residues in proteins.  相似文献   

10.

Background  

While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate.  相似文献   

11.
Protein structural annotation and classification is an important and challenging problem in bioinformatics. Research towards analysis of sequence-structure correspondences is critical for better understanding of a protein's structure, function, and its interaction with other molecules. Clustering of protein domains based on their structural similarities provides valuable information for protein classification schemes. In this article, we attempt to determine whether structure information alone is sufficient to adequately classify protein structures. We present an algorithm that identifies regions of structural similarity within a given set of protein structures, and uses those regions for clustering. In our approach, called STRALCP (STRucture ALignment-based Clustering of Proteins), we generate detailed information about global and local similarities between pairs of protein structures, identify fragments (spans) that are structurally conserved among proteins, and use these spans to group the structures accordingly. We also provide a web server at http://as2ts.llnl.gov/AS2TS/STRALCP/ for selecting protein structures, calculating structurally conserved regions and performing automated clustering.  相似文献   

12.
Shindyalov IN  Bourne PE 《Proteins》2000,38(3):247-260
Comparing and subsequently classifying protein structures information has received significant attention concurrent with the increase in the number of experimentally derived 3-dimensional structures. Classification schemes have focused on biological function found within protein domains and on structure classification based on topology. Here an alternative view is presented that groups substructures. Substructures are long (50-150 residue) highly repetitive near-contiguous pieces of polypeptide chain that occur frequently in a set of proteins from the PDB defined as structurally non-redundant over the complete polypeptide chain. The substructure classification is based on a previously reported Combinatorial Extension (CE) algorithm that provides a significantly different set of structure alignments than those previously described, having, for example, only a 40% overlap with FSSP. Qualitatively the algorithm provides longer contiguous aligned segments at the price of a slightly higher root-mean-square deviation (rmsd). Clustering these alignments gives a discreet and highly repetitive set of substructures not detectable by sequence similarity alone. In some cases different substructures represent all or different parts of well known folds indicative of the Russian doll effect--the continuity of protein fold space. In other cases they fall into different structure and functional classifications. It is too early to determine whether these newly classified substructures represent new insights into the evolution of a structural framework important to many proteins. What is apparent from on-going work is that these substructures have the potential to be useful probes in finding remote sequence homology and in structure prediction studies. The characteristics of the complete all-by-all comparison of the polypeptide chains present in the PDB and details of the filtering procedure by pair-wise structure alignment that led to the emergent substructure gallery are discussed. Substructure classification, alignments, and tools to analyze them are available at http://cl.sdsc.edu/ce.html.  相似文献   

13.
Protein structural alignments are generally considered as ‘golden standard’ for the alignment at the level of amino acid residues. In this study we have compared the quality of pairwise and multiple structural alignments of about 5900 homologous proteins from 718 families of known 3-D structures. We observe shifts in the alignment of regular secondary structural elements (helices and strands) between pairwise and multiple structural alignments. The differences between pairwise and multiple structural alignments within helical and β-strand regions often correspond to 4 and 2 residue positions respectively. Such shifts correspond approximately to “one turn” of these regular secondary structures. We have performed manual analysis explicitly on the family of protein kinases. We note shifts of one or two turns in helix-helix alignments obtained using pairwise and multiple structural alignments. Investigations on the quality of the equivalent helix-helix, strand-strand pairs in terms of their residue side-chain accessibilities have been made. Our results indicate that the quality of the pairwise alignments is comparable to that of the multiple structural alignments and, in fact, is often better. We propose that pairwise alignment of protein structures should also be used in formulation of methods for structure prediction and evolutionary analysis.  相似文献   

14.
Comparing and classifying the three-dimensional (3D) structures of proteins is of crucial importance to molecular biology, from helping to determine the function of a protein to determining its evolutionary relationships. Traditionally, 3D structures are classified into groups of families that closely resemble the grouping according to their primary sequence. However, significant structural similarities exist at multiple levels between proteins that belong to these different structural families. In this study, we propose a new algorithm, CLICK, to capture such similarities. The method optimally superimposes a pair of protein structures independent of topology. Amino acid residues are represented by the Cartesian coordinates of a representative point (usually the C(α) atom), side chain solvent accessibility, and secondary structure. Structural comparison is effected by matching cliques of points. CLICK was extensively benchmarked for alignment accuracy on four different sets: (i) 9537 pair-wise alignments between two structures with the same topology; (ii) 64 alignments from set (i) that were considered to constitute difficult alignment cases; (iii) 199 pair-wise alignments between proteins with similar structure but different topology; and (iv) 1275 pair-wise alignments of RNA structures. The accuracy of CLICK alignments was measured by the average structure overlap score and compared with other alignment methods, including HOMSTRAD, MUSTANG, Geometric Hashing, SALIGN, DALI, GANGSTA(+), FATCAT, ARTS and SARA. On average, CLICK produces pair-wise alignments that are either comparable or statistically significantly more accurate than all of these other methods. We have used CLICK to uncover relationships between (previously) unrelated proteins. These new biological insights include: (i) detecting hinge regions in proteins where domain or sub-domains show flexibility; (ii) discovering similar small molecule binding sites from proteins of different folds and (iii) discovering topological variants of known structural/sequence motifs. Our method can generally be applied to compare any pair of molecular structures represented in Cartesian coordinates as exemplified by the RNA structure superimposition benchmark.  相似文献   

15.

Background  

Design of protein structure comparison algorithm is an important research issue, having far reaching implications. In this article, we describe a protein structure comparison scheme, which is capable of detecting correct alignments even in difficult cases, e.g. non-topological similarities. The proposed method computes protein structure alignments by comparing, small substructures, called neighborhoods. Two different types of neighborhoods, sequence and structure, are defined, and two algorithms arising out of the scheme are detailed. A new method for computing equivalences having non-topological similarities from pairwise similarity score is described. A novel and fast technique for comparing sequence neighborhoods is also developed.  相似文献   

16.
We apply a simple method for aligning protein sequences on the basis of a 3D structure, on a large scale, to the proteins in the scop classification of fold families. This allows us to assess, understand, and improve our automatic method against an objective, manually derived standard, a type of comprehensive evaluation that has not yet been possible for other structural alignment algorithms. Our basic approach directly matches the backbones of two structures, using repeated cycles of dynamic programming and least-squares fitting to determine an alignment minimizing coordinate difference. Because of simplicity, our method can be readily modified to take into account additional features of protein structure such as the orientation of side chains or the location-dependent cost of opening a gap. Our basic method, augmented by such modifications, can find reasonable alignments for all but 1.5% of the known structural similarities in scop, i.e., all but 32 of the 2,107 superfamily pairs. We discuss the specific protein structural features that make these 32 pairs so difficult to align and show how our procedure effectively partitions the relationships in scop into different categories, depending on what aspects of protein structure are involved (e.g., depending on whether or not consideration of side-chain orientation is necessary for proper alignment). We also show how our pairwise alignment procedure can be extended to generate a multiple alignment for a group of related structures. We have compared these alignments in detail with corresponding manual ones culled from the literature. We find good agreement (to within 95% for the core regions), and detailed comparison highlights how particular protein structural features (such as certain strands) are problematical to align, giving somewhat ambiguous results. With these improvements and systematic tests, our procedure should be useful for the development of scop and the future classification of protein folds.  相似文献   

17.
Sequence comparison methods based on position-specific score matrices (PSSMs) have proven a useful tool for recognition of the divergent members of a protein family and for annotation of functional sites. Here we investigate one of the factors that affects overall performance of PSSMs in a PSI-BLAST search, the algorithm used to construct the seed alignment upon which the PSSM is based. We compare PSSMs based on alignments constructed by global sequence similarity (ClustalW and ClustalW-pairwise), local sequence similarity (BLAST), and local structure similarity (VAST). To assess performance with respect to identification of conserved functional or structural sites, we examine the accuracy of the three-dimensional molecular models predicted by PSSM-sequence alignments. Using the known structures of those sequences as the standard of truth, we find that model accuracy varies with the algorithm used for seed alignment construction in the pattern local-structure (VAST) > local-sequence (BLAST) > global-sequence (ClustalW). Using structural similarity of query and database proteins as the standard of truth, we find that PSSM recognition sensitivity depends primarily on the diversity of the sequences included in the alignment, with an optimum around 30-50% average pairwise identity. We discuss these observations, and suggest a strategy for constructing seed alignments that optimize PSSM-sequence alignment accuracy and recognition sensitivity.  相似文献   

18.
Here, we discuss the relationship between protein sequence and protein structural similarity. It is established that a protein structural distance (PSD) of 2.0 is a threshold above which two proteins are unlikely to have a detectable pairwise sequence relationship. A precise correlation is established between the level of sequence similarity, defined by a normalized Smith-Waterman score, and the probability that two proteins will have a similar structure (defined by pairwise PSD<2). This correlation can be used in evaluating the likelihood for success in a comparative modeling procedure. We establish the existence of a correlation between sequence and structural similarity for pairs of proteins that are related in structure but whose sequence relationship is not detectable using standard pairwise sequence alignments. Although it is well known that there is a close relationship between sequence and structural similarity for pairwise sequence identities greater than about 30 %, there has been little discussion as to the possible existence of such a relationship for pairs of proteins in or below the twilight zone of sequence similarity (<25 % pairwise sequence identity). Possible implications of our results for the evolution of protein structure are discussed.  相似文献   

19.
A novel protein structure alignment technique has been developed reducing much of the secondary and tertiary structure to a sequential representation greatly accelerating many structural computations, including alignment. Constructed from incidence relations in the Delaunay tetrahedralization, alignments of the sequential representation describe structural similarities that cannot be expressed with rigid-body superposition and complement existing techniques minimizing root-mean-squared distance through superposition. Restricting to the largest substructure superimposable by a single rigid-body transformation determines an alignment suitable for root-mean-squared distance comparisons and visualization. Restricted alignments of a test set of histones and histone-like proteins determined superpositions nearly identical to those produced by the established structure alignment routines of DaliLite and ProSup. Alignment of three, increasingly complex proteins: ferredoxin, cytidine deaminase, and carbamoyl phosphate synthetase, to themselves, demonstrated previously identified regions of self-similarity. All-against-all similarity index comparisons performed on a test set of 45 class I and class II aminoacyl-tRNA synthetases closely reproduced the results of established distance matrix methods while requiring 1/16 the time. Principal component analysis of pairwise tetrahedral decomposition similarity of 2300 molecular dynamics snapshots of tryptophanyl-tRNA synthetase revealed discrete microstates within the trajectory consistent with experimental results. The method produces results with sufficient efficiency for large-scale multiple structure alignment and is well suited to genomic and evolutionary investigations where no geometric model of similarity is known a priori.  相似文献   

20.
MOTIVATION: Proteins of the same class often share a secondary structure packing arrangement but differ in how the secondary structure units are ordered in the sequence. We find that proteins that share a common core also share local sequence-structure similarities, and these can be exploited to align structures with different topologies. In this study, segments from a library of local sequence-structure alignments were assembled hierarchically, enforcing the compactness and conserved inter-residue contacts but not sequential ordering. Previous structure-based alignment methods often ignore sequence similarity, local structural equivalence and compactness. RESULTS: The new program, SCALI (Structural Core ALIgnment), can efficiently find conserved packing arrangements, even if they are non-sequentially ordered in space. SCALI alignments conserve remote sequence similarity and contain fewer alignment errors. Clustering of our pairwise non-sequential alignments shows that recurrent packing arrangements exist in topologically different structures. For example, the three-layer sandwich domain architecture may be divided into four structural subclasses based on internal packing arrangements. These subclasses represent an intermediate level of structure classification, more general than topology, but more specific than architecture as defined in CATH. A strategy is presented for developing a set of predictive hidden Markov models based on multiple SCALI alignments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号