首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A novel protein structure alignment technique has been developed reducing much of the secondary and tertiary structure to a sequential representation greatly accelerating many structural computations, including alignment. Constructed from incidence relations in the Delaunay tetrahedralization, alignments of the sequential representation describe structural similarities that cannot be expressed with rigid-body superposition and complement existing techniques minimizing root-mean-squared distance through superposition. Restricting to the largest substructure superimposable by a single rigid-body transformation determines an alignment suitable for root-mean-squared distance comparisons and visualization. Restricted alignments of a test set of histones and histone-like proteins determined superpositions nearly identical to those produced by the established structure alignment routines of DaliLite and ProSup. Alignment of three, increasingly complex proteins: ferredoxin, cytidine deaminase, and carbamoyl phosphate synthetase, to themselves, demonstrated previously identified regions of self-similarity. All-against-all similarity index comparisons performed on a test set of 45 class I and class II aminoacyl-tRNA synthetases closely reproduced the results of established distance matrix methods while requiring 1/16 the time. Principal component analysis of pairwise tetrahedral decomposition similarity of 2300 molecular dynamics snapshots of tryptophanyl-tRNA synthetase revealed discrete microstates within the trajectory consistent with experimental results. The method produces results with sufficient efficiency for large-scale multiple structure alignment and is well suited to genomic and evolutionary investigations where no geometric model of similarity is known a priori.  相似文献   

2.
The biological role, biochemical function, and structure of uncharacterized protein sequences is often inferred from their similarity to known proteins. A constant goal is to increase the reliability, sensitivity, and accuracy of alignment techniques to enable the detection of increasingly distant relationships. Development, tuning, and testing of these methods benefit from appropriate benchmarks for the assessment of alignment accuracy.Here, we describe a benchmark protocol to estimate sequence-to-sequence and sequence-to-structure alignment accuracy. The protocol consists of structurally related pairs of proteins and procedures to evaluate alignment accuracy over the whole set. The set of protein pairs covers all the currently known fold types. The benchmark is challenging in the sense that it consists of proteins lacking clear sequence similarity.Correct target alignments are derived from the three-dimensional structures of these pairs by rigid body superposition. An evaluation engine computes the accuracy of alignments obtained from a particular algorithm in terms of alignment shifts with respect to the structure derived alignments. Using this benchmark we estimate that the best results can be obtained from a combination of amino acid residue substitution matrices and knowledge-based potentials.  相似文献   

3.
The long-standing problem of constructing protein structure alignments is of central importance in computational biology. The main goal is to provide an alignment of residue correspondences, in order to identify homologous residues across chains. A critical next step of this is the alignment of protein complexes and their interfaces. Here, we introduce the program CMAPi, a two-dimensional dynamic programming algorithm that, given a pair of protein complexes, optimally aligns the contact maps of their interfaces: it produces polynomial-time near-optimal alignments in the case of multiple complexes. We demonstrate the efficacy of our algorithm on complexes from PPI families listed in the SCOPPI database and from highly divergent cytokine families. In comparison to existing techniques, CMAPi generates more accurate alignments of interacting residues within families of interacting proteins, especially for sequences with low similarity. While previous methods that use an all-atom based representation of the interface have been successful, CMAPi's use of a contact map representation allows it to be more tolerant to conformational changes and thus to align more of the interaction surface. These improved interface alignments should enhance homology modeling and threading methods for predicting PPIs by providing a basis for generating template profiles for sequence-structure alignment.  相似文献   

4.
Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both single and multi-domain comparisons. The CAB-align software is freely available to academic users as stand-alone software at http://www.pharm.kitasato-u.ac.jp/bmd/bmd/Publications.html.  相似文献   

5.
MOTIVATION: We introduce the iRMSD, a new type of RMSD, independent from any structure superposition and suitable for evaluating sequence alignments of proteins with known structures. RESULTS: We demonstrate that the iRMSD is equivalent to the standard RMSD although much simpler to compute and we also show that it is suitable for comparing sequence alignments and benchmarking multiple sequence alignment methods. We tested the iRMSD score on 6 established multiple sequence alignment packages and found the results to be consistent with those obtained using an established reference alignment collection like Prefab. AVAILABILITY: The iRMSD is part of the T-Coffee package and is distributed as an open source freeware (http://www.tcoffee.org/).  相似文献   

6.
Shah SB  Sahinidis NV 《PloS one》2012,7(5):e37493
Protein structure alignment is the problem of determining an assignment between the amino-acid residues of two given proteins in a way that maximizes a measure of similarity between the two superimposed protein structures. By identifying geometric similarities, structure alignment algorithms provide critical insights into protein functional similarities. Existing structure alignment tools adopt a two-stage approach to structure alignment by decoupling and iterating between the assignment evaluation and structure superposition problems. We introduce a novel approach, SAS-Pro, which addresses the assignment evaluation and structure superposition simultaneously by formulating the alignment problem as a single bilevel optimization problem. The new formulation does not require the sequentiality constraints, thus generalizing the scope of the alignment methodology to include non-sequential protein alignments. We employ derivative-free optimization methodologies for searching for the global optimum of the highly nonlinear and non-differentiable RMSD function encountered in the proposed model. Alignments obtained with SAS-Pro have better RMSD values and larger lengths than those obtained from other alignment tools. For non-sequential alignment problems, SAS-Pro leads to alignments with high degree of similarity with known reference alignments. The source code of SAS-Pro is available for download at http://eudoxus.cheme.cmu.edu/saspro/SAS-Pro.html.  相似文献   

7.
MOTIVATION: The number of known protein sequences is about thousand times larger than the number of experimentally solved 3D structures. For more than half of the protein sequences a close or distant structural analog could be identified. The key starting point in a classical comparative modeling is to generate the best possible sequence alignment with a template or templates. With decreasing sequence similarity, the number of errors in the alignments increases and these errors are the main causes of the decreasing accuracy of the molecular models generated. Here we propose a new approach to comparative modeling, which does not require the implicit alignment - the model building phase explores geometric, evolutionary and physical properties of a template (or templates). RESULTS: The proposed method requires prior identification of a template, although the initial sequence alignment is ignored. The model is built using a very efficient reduced representation search engine CABS to find the best possible superposition of the query protein onto the template represented as a 3D multi-featured scaffold. The criteria used include: sequence similarity, predicted secondary structure consistency, local geometric features and hydrophobicity profile. For more difficult cases, the new method qualitatively outperforms existing schemes of comparative modeling. The algorithm unifies de novo modeling, 3D threading and sequence-based methods. The main idea is general and could be easily combined with other efficient modeling tools as Rosetta, UNRES and others.  相似文献   

8.
Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.  相似文献   

9.
MOTIVATION: We describe APDB, a novel measure for evaluating the quality of a protein sequence alignment, given two or more PDB structures. This evaluation does not require a reference alignment or a structure superposition. APDB is designed to efficiently and objectively benchmark multiple sequence alignment methods. RESULTS: Using existing collections of reference multiple sequence alignments and existing alignment methods, we show that APDB gives results that are consistent with those obtained using conventional evaluations. We also show that APDB is suitable for evaluating sequence alignments that are structurally equivalent. We conclude that APDB provides an alternative to more conventional methods used for benchmarking sequence alignment packages.  相似文献   

10.
Similarity of protein structures has been analyzed using three-dimensional Delaunay triangulation patterns derived from the backbone representation. It has been found that structurally related proteins have a common spatial invariant part, a set of tetrahedrons, mathematically described as a common spatial subgraph volume of the three-dimensional contact graph derived from Delaunay tessellation (DT). Based on this property of protein structures, we present a novel common volume superimposition (TOPOFIT) method to produce structural alignments. Structural alignments usually evaluated by a number of equivalent (aligned) positions (N(e)) with corresponding root mean square deviation (RMSD). The superimposition of the DT patterns allows one to uniquely identify a maximal common number of equivalent residues in the structural alignment. In other words, TOPOFIT identifies a feature point on the RMSD N(e) curve, a topomax point, until which the topologies of two structures correspond to each other, including backbone and interresidue contacts, whereas the growing number of mismatches between the DT patterns occurs at larger RMSD (N(e)) after the topomax point. It has been found that the topomax point is present in all alignments from different protein structural classes; therefore, the TOPOFIT method identifies common, invariant structural parts between proteins. The alignments produced by the TOPOFIT method have a good correlation with alignments produced by other current methods. This novel method opens new opportunities for the comparative analysis of protein structures and for more detailed studies on understanding the molecular principles of tertiary structure organization and functionality. The TOPOFIT method also helps to detect conformational changes, topological differences in variable parts, which are particularly important for studies of variations in active/ binding sites and protein classification.  相似文献   

11.
Sequence comparison methods based on position-specific score matrices (PSSMs) have proven a useful tool for recognition of the divergent members of a protein family and for annotation of functional sites. Here we investigate one of the factors that affects overall performance of PSSMs in a PSI-BLAST search, the algorithm used to construct the seed alignment upon which the PSSM is based. We compare PSSMs based on alignments constructed by global sequence similarity (ClustalW and ClustalW-pairwise), local sequence similarity (BLAST), and local structure similarity (VAST). To assess performance with respect to identification of conserved functional or structural sites, we examine the accuracy of the three-dimensional molecular models predicted by PSSM-sequence alignments. Using the known structures of those sequences as the standard of truth, we find that model accuracy varies with the algorithm used for seed alignment construction in the pattern local-structure (VAST) > local-sequence (BLAST) > global-sequence (ClustalW). Using structural similarity of query and database proteins as the standard of truth, we find that PSSM recognition sensitivity depends primarily on the diversity of the sequences included in the alignment, with an optimum around 30-50% average pairwise identity. We discuss these observations, and suggest a strategy for constructing seed alignments that optimize PSSM-sequence alignment accuracy and recognition sensitivity.  相似文献   

12.
Most phylogenetic‐tree building applications use multiple sequence alignments as a starting point. A recent meta‐level methodology, called Heads or Tails, aims to reveal the quality of multiple sequence alignments by comparing alignments taken in the forward direction with the alignments of the same sequences when the sequences are reversed. Through an examination of a special case for multiple sequence alignment – pair‐wise alignments, where an optimal algorithm exists – and the use of a modi?ed global‐alignment application, it is shown that the forward and reverse alignments, even when they are the same, do not capture all the possible variations in the alignments and when the forward and reverse alignments differ there may be other alignments that remain unaccounted for. The implication is that comparing just the forward and (biologically irrelevant) reverse alignments is not sufficient to capture the variability in multiple sequence alignments, and the Heads or Tails methodology is therefore not suitable as a method for investigating multiple sequence alignment accuracy. Part of the reason is the inability of individual multiple sequence alignment applications to adequately sample the space of possible alignments. A further implication is that the Hall [Hall, B.G., 2008. Mol. Biol. Evol. 25, 1576–1580] methodology may create optimal synthetic multiple sequence alignments that extant aligners will be unable to completely recover ab initio due to alternative alignments being possible at particular sites. In general, it is shown that more divergent sequences will give rise to an increased number of alternative alignments, so sequence sets with a higher degree of similarity are preferable to sets with lower similarity as the starting point for phylogenetic tree building. © The Willi Hennig Society 2009.  相似文献   

13.
Detection of homologous proteins with low-sequence identity to a given target (remote homologues) is routinely performed with alignment algorithms that take advantage of sequence profile. In this article, we investigate the efficacy of different alignment procedures for the task at hand on a set of 185 protein pairs with similar structures but low-sequence similarity. Criteria based on the SCOP label detection and MaxSub scores are adopted to score the results. We investigate the efficacy of alignments based on sequence-sequence, sequence-profile, and profile-profile information. We confirm that with profile-profile alignments the results are better than with other procedures. In addition, we report, and this is novel, that the selection of the results of the profile-profile alignments can be improved by using Shannon entropy, indicating that this parameter is important to recognize good profile-profile alignments among a plethora of meaningless pairs. By this, we enhance the global search accuracy without losing sensitivity and filter out most of the erroneous alignments. We also show that when the entropy filtering is adopted, the quality of the resulting alignments is comparable to that computed for the target and template structures with CE, a structural alignment program.  相似文献   

14.
R B Russell  G J Barton 《Proteins》1992,14(2):309-323
An algorithm is presented for the accurate and rapid generation of multiple protein sequence alignments from tertiary structure comparisons. A preliminary multiple sequence alignment is performed using sequence information, which then determines an initial superposition of the structures. A structure comparison algorithm is applied to all pairs of proteins in the superimposed set and a similarity tree calculated. Multiple sequence alignments are then generated by following the tree from the branches to the root. At each branchpoint of the tree, a structure-based sequence alignment and coordinate transformations are output, with the multiple alignment of all structures output at the root. The algorithm encoded in STAMP (STructural Alignment of Multiple Proteins) is shown to give alignments in good agreement with published structural accounts within the dehydrogenase fold domains, globins, and serine proteinases. In order to reduce the need for visual verification, two similarity indices are introduced to determine the quality of each generated structural alignment. Sc quantifies the global structural similarity between pairs or groups of proteins, whereas Pij' provides a normalized measure of the confidence in the alignment of each residue. STAMP alignments have the quality of each alignment characterized by Sc and Pij' values and thus provide a reproducible resource for studies of residue conservation within structural motifs.  相似文献   

15.
Alignment of protein sequences is a key step in most computational methods for prediction of protein function and homology-based modeling of three-dimensional (3D)-structure. We investigated correspondence between "gold standard" alignments of 3D protein structures and the sequence alignments produced by the Smith-Waterman algorithm, currently the most sensitive method for pair-wise alignment of sequences. The results of this analysis enabled development of a novel method to align a pair of protein sequences. The comparison of the Smith-Waterman and structure alignments focused on their inner structure and especially on the continuous ungapped alignment segments, "islands" between gaps. Approximately one third of the islands in the gold standard alignments have negative or low positive score, and their recognition is below the sensitivity limit of the Smith-Waterman algorithm. From the alignment accuracy perspective, the time spent by the algorithm while working in these unalignable regions is unnecessary. We considered features of the standard similarity scoring function responsible for this phenomenon and suggested an alternative hierarchical algorithm, which explicitly addresses high scoring regions. This algorithm is considerably faster than the Smith-Waterman algorithm, whereas resulting alignments are in average of the same quality with respect to the gold standard. This finding shows that the decrease of alignment accuracy is not necessarily a price for the computational efficiency.  相似文献   

16.
In the era of structural genomics, it is necessary to generate accurate structural alignments in order to build good templates for homology modeling. Although a great number of structural alignment algorithms have been developed, most of them ignore intermolecular interactions during the alignment procedure. Therefore, structures in different oligomeric states are barely distinguishable, and it is very challenging to find correct alignment in coil regions. Here we present a novel approach to structural alignment using a clique finding algorithm and environmental information (SAUCE). In this approach, we build the alignment based on not only structural coordinate information but also realistic environmental information extracted from biological unit files provided by the Protein Data Bank (PDB). At first, we eliminate all environmentally unfavorable pairings of residues. Then we identify alignments in core regions via a maximal clique finding algorithm. Two extreme value distribution (EVD) form statistics have been developed to evaluate core region alignments. With an optional extension step, global alignment can be derived based on environment-based dynamic programming linking. We show that our method is able to differentiate three-dimensional structures in different oligomeric states, and is able to find flexible alignments between multidomain structures without predetermined hinge regions. The overall performance is also evaluated on a large scale by comparisons to current structural classification databases as well as to other alignment methods.  相似文献   

17.

Background  

While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate.  相似文献   

18.
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.  相似文献   

19.
In this study, we investigate the extent to which techniques for homology modeling that were developed for water-soluble proteins are appropriate for membrane proteins as well. To this end we present an assessment of current strategies for homology modeling of membrane proteins and introduce a benchmark data set of homologous membrane protein structures, called HOMEP. First, we use HOMEP to reveal the relationship between sequence identity and structural similarity in membrane proteins. This analysis indicates that homology modeling is at least as applicable to membrane proteins as it is to water-soluble proteins and that acceptable models (with C alpha-RMSD values to the native of 2 A or less in the transmembrane regions) may be obtained for template sequence identities of 30% or higher if an accurate alignment of the sequences is used. Second, we show that secondary-structure prediction algorithms that were developed for water-soluble proteins perform approximately as well for membrane proteins. Third, we provide a comparison of a set of commonly used sequence alignment algorithms as applied to membrane proteins. We find that high-accuracy alignments of membrane protein sequences can be obtained using state-of-the-art profile-to-profile methods that were developed for water-soluble proteins. Improvements are observed when weights derived from the secondary structure of the query and the template are used in the scoring of the alignment, a result which relies on the accuracy of the secondary-structure prediction of the query sequence. The most accurate alignments were obtained using template profiles constructed with the aid of structural alignments. In contrast, a simple sequence-to-sequence alignment algorithm, using a membrane protein-specific substitution matrix, shows no improvement in alignment accuracy. We suggest that profile-to-profile alignment methods should be adopted to maximize the accuracy of homology models of membrane proteins.  相似文献   

20.
An open question in protein homology modeling is, how well do current modeling packages satisfy the dual criteria of quality of results and practical ease of use? To address this question objectively, we examined homology‐built models of a variety of therapeutically relevant proteins. The sequence identities across these proteins range from 19% to 76%. A novel metric, the difference alignment index (DAI), is developed to aid in quantifying the quality of local sequence alignments. The DAI is also used to construct the relative sequence alignment (RSA), a new representation of global sequence alignment that facilitates comparison of sequence alignments from different methods. Comparisons of the sequence alignments in terms of the RSA and alignment methodologies are made to better understand the advantages and caveats of each method. All sequence alignments and corresponding 3D models are compared to their respective structure‐based alignments and crystal structures. A variety of protein modeling software was used. We find that at sequence identities >40%, all packages give similar (and satisfactory) results; at lower sequence identities (<25%), the sequence alignments generated by Profit and Prime, which incorporate structural information in their sequence alignment, stand out from the rest. Moreover, the model generated by Prime in this low sequence identity region is noted to be superior to the rest. Additionally, we note that DSModeler and MOE, which generate reasonable models for sequence identities >25%, are significantly more functional and easier to use when compared with the other structure‐building software.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号