首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Background  

Structural alignment is an important step in protein comparison. Well-established methods exist for solving this problem under the assumption that the structures under comparison are considered as rigid bodies. However, proteins are flexible entities often undergoing movements that alter the positions of domains or subdomains with respect to each other. Such movements can impede the identification of structural equivalences when rigid aligners are used.  相似文献   

2.
R-Coffee is a multiple RNA alignment package, derived from T-Coffee, designed to align RNA sequences while exploiting secondary structure information. R-Coffee uses an alignment-scoring scheme that incorporates secondary structure information within the alignment. It works particularly well as an alignment improver and can be combined with any existing sequence alignment method. In this work, we used R-Coffee to compute multiple sequence alignments combining the pairwise output of sequence aligners and structural aligners. We show that R-Coffee can improve the accuracy of all the sequence aligners. We also show that the consistency-based component of T-Coffee can improve the accuracy of several structural aligners. R-Coffee was tested on 388 BRAliBase reference datasets and on 11 longer Cmfinder datasets. Altogether our results suggest that the best protocol for aligning short sequences (less than 200 nt) is the combination of R-Coffee with the RNA pairwise structural aligner Consan. We also show that the simultaneous combination of the four best sequence alignment programs with R-Coffee produces alignments almost as accurate as those obtained with R-Coffee/Consan. Finally, we show that R-Coffee can also be used to align longer datasets beyond the usual scope of structural aligners. R-Coffee is freely available for download, along with documentation, from the T-Coffee web site (www.tcoffee.org).  相似文献   

3.
C A Orengo  N P Brown  W R Taylor 《Proteins》1992,14(2):139-167
A fast method is described for searching and analyzing the protein structure databank. It uses secondary structure followed by residue matching to compare protein structures and is developed from a previous structural alignment method based on dynamic programming. Linear representations of secondary structures are derived and their features compared to identify equivalent elements in two proteins. The secondary structure alignment then constrains the residue alignment, which compares only residues within aligned secondary structures and with similar buried areas and torsional angles. The initial secondary structure alignment improves accuracy and provides a means of filtering out unrelated proteins before the slower residue alignment stage. It is possible to search or sort the protein structure databank very quickly using just secondary structure comparisons. A search through 720 structures with a probe protein of 10 secondary structures required 1.7 CPU hours on a Sun 4/280. Alternatively, combined secondary structure and residue alignments, with a cutoff on the secondary structure score to remove pairs of unrelated proteins from further analysis, took 10.1 CPU hours. The method was applied in searches on different classes of proteins and to cluster a subset of the databank into structurally related groups. Relationships were consistent with known families of protein structure.  相似文献   

4.
An appropriate structural superposition identifies similarities and differences between homologous proteins that are not evident from sequence alignments alone. We have coupled our Gaussian‐weighted RMSD (wRMSD) tool with a sequence aligner and seed extension (SE) algorithm to create a robust technique for overlaying structures and aligning sequences of homologous proteins (HwRMSD). HwRMSD overcomes errors in the initial sequence alignment that would normally propagate into a standard RMSD overlay. SE can generate a corrected sequence alignment from the improved structural superposition obtained by wRMSD. HwRMSD's robust performance and its superiority over standard RMSD are demonstrated over a range of homologous proteins. Its better overlay results in corrected sequence alignments with good agreement to HOMSTRAD. Finally, HwRMSD is compared to established structural alignment methods: FATCAT, secondary‐structure matching, combinatorial extension, and Dalilite. Most methods are comparable at placing residue pairs within 2 Å, but HwRMSD places many more residue pairs within 1 Å, providing a clear advantage. Such high accuracy is essential in drug design, where small distances can have a large impact on computational predictions. This level of accuracy is also needed to correct sequence alignments in an automated fashion, especially for omics‐scale analysis. HwRMSD can align homologs with low‐sequence identity and large conformational differences, cases where both sequence‐based and structural‐based methods may fail. The HwRMSD pipeline overcomes the dependency of structural overlays on initial sequence pairing and removes the need to determine the best sequence‐alignment method, substitution matrix, and gap parameters for each unique pair of homologs. Proteins 2012. © 2012 Wiley Periodicals, Inc.  相似文献   

5.
Lin HN  Notredame C  Chang JM  Sung TY  Hsu WL 《PloS one》2011,6(12):e27872
Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently.In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.  相似文献   

6.
We have developed MUMMALS, a program to construct multiple protein sequence alignment using probabilistic consistency. MUMMALS improves alignment quality by using pairwise alignment hidden Markov models (HMMs) with multiple match states that describe local structural information without exploiting explicit structure predictions. Parameters for such models have been estimated from a large library of structure-based alignments. We show that (i) on remote homologs, MUMMALS achieves statistically best accuracy among several leading aligners, such as ProbCons, MAFFT and MUSCLE, albeit the average improvement is small, in the order of several percent; (ii) a large collection (>10000) of automatically computed pairwise structure alignments of divergent protein domains is superior to smaller but carefully curated datasets for estimation of alignment parameters and performance tests; (iii) reference-independent evaluation of alignment quality using sequence alignment-dependent structure superpositions correlates well with reference-dependent evaluation that compares sequence-based alignments to structure-based reference alignments.  相似文献   

7.
8.
Kann MG  Goldstein RA 《Proteins》2002,48(2):367-376
A detailed analysis of the performance of hybrid, a new sequence alignment algorithm developed by Yu and coworkers that combines Smith Waterman local dynamic programming with a local version of the maximum-likelihood approach, was made to access the applicability of this algorithm to the detection of distant homologs by sequence comparison. We analyzed the statistics of hybrid with a set of nonhomologous protein sequences from the SCOP database and found that the statistics of the scores from hybrid algorithm follows an Extreme Value Distribution with lambda approximately 1, as previously shown by Yu et al. for the case of artificially generated sequences. Local dynamic programming was compared to the hybrid algorithm by using two different test data sets of distant homologs from the PFAM and COGs protein sequence databases. The studies were made with several score functions in current use including OPTIMA, a new score function originally developed to detect remote homologs with the Smith Waterman algorithm. We found OPTIMA to be the best score function for both both dynamic programming and the hybrid algorithms. The ability of dynamic programming to discriminate between homologs and nonhomologs in the two sets of distantly related sequences is slightly better than that of hybrid algorithm. The advantage of producing accurate score statistics with only a few simulations may overcome the small differences in performance and make this new algorithm suitable for detection of homologs in conjunction with a wide range of score functions and gap penalties.  相似文献   

9.
Protein structural alignments are generally considered as ‘golden standard’ for the alignment at the level of amino acid residues. In this study we have compared the quality of pairwise and multiple structural alignments of about 5900 homologous proteins from 718 families of known 3-D structures. We observe shifts in the alignment of regular secondary structural elements (helices and strands) between pairwise and multiple structural alignments. The differences between pairwise and multiple structural alignments within helical and β-strand regions often correspond to 4 and 2 residue positions respectively. Such shifts correspond approximately to “one turn” of these regular secondary structures. We have performed manual analysis explicitly on the family of protein kinases. We note shifts of one or two turns in helix-helix alignments obtained using pairwise and multiple structural alignments. Investigations on the quality of the equivalent helix-helix, strand-strand pairs in terms of their residue side-chain accessibilities have been made. Our results indicate that the quality of the pairwise alignments is comparable to that of the multiple structural alignments and, in fact, is often better. We propose that pairwise alignment of protein structures should also be used in formulation of methods for structure prediction and evolutionary analysis.  相似文献   

10.
Protein structural alignments are generally considered as 'golden standard' for the alignment at the level of amino acid residues. In this study we have compared the quality of pairwise and multiple structural alignments of about 5900 homologous proteins from 718 families of known 3-D structures. We observe shifts in the alignment of regular secondary structural elements (helices and strands) between pairwise and multiple structural alignments. The differences between pairwise and multiple structural alignments within helical and beta-strand regions often correspond to 4 and 2 residue positions respectively. Such shifts correspond approximately to "one turn" of these regular secondary structures. We have performed manual analysis explicitly on the family of protein kinases. We note shifts of one or two turns in helix-helix alignments obtained using pairwise and multiple structural alignments. Investigations on the quality of the equivalent helix-helix, strand-strand pairs in terms of their residue side-chain accessibilities have been made. Our results indicate that the quality of the pairwise alignments is comparable to that of the multiple structural alignments and, in fact, is often better. We propose that pairwise alignment of protein structures should also be used in formulation of methods for structure prediction and evolutionary analysis.  相似文献   

11.
We present a framework for improving local protein alignment algorithms. Specifically, we discuss how to extend local protein aligners to use a collection of vector seeds or ungapped alignment seeds to reduce noise hits. We model picking a set of seed models as an integer programming problem and give algorithms to choose such a set of seeds. While the problem is NP-hard, and Quasi-NP-hard to approximate to within a logarithmic factor, it can be solved easily in practice. A good set of seeds we have chosen allows four to five times fewer false positive hits, while preserving essentially identical sensitivity as BLASTP.  相似文献   

12.
Protein structure alignment methods are used for the detection of evolutionary and functionally related positions in proteins. A wide array of different methods are available, but the choice of the best method is often not apparent to the user. Several studies have assessed the alignment accuracy and consistency of structure alignment methods, but none of these explicitly considered membrane proteins, which are important targets for drug development and have distinct structural features. Here, we compared 13 widely used pairwise structural alignment methods on a test set of homologous membrane protein structures (called HOMEP3). Each pair of structures was aligned and the corresponding sequence alignment was used to construct homology models. The model accuracy compared to the known structures was assessed using scoring functions not incorporated in the tested structural alignment methods. The analysis shows that fragment‐based approaches such as FR‐TM‐align are the most useful for aligning structures of membrane proteins. Moreover, fragment‐based approaches are more suitable for comparison of protein structures that have undergone large conformational changes. Nevertheless, no method was clearly superior to all other methods. Additionally, all methods lack a measure to rate the reliability of a position within a structure alignment. To solve both of these problems, we propose a consensus‐type approach, combining alignments from four different methods, namely FR‐TM‐align, DaliLite, MATT, and FATCAT. Agreement between the methods is used to assign confidence values to each position of the alignment. Overall, we conclude that there remains scope for the improvement of structural alignment methods for membrane proteins. Proteins 2015; 83:1720–1732. © 2015 Wiley Periodicals, Inc.  相似文献   

13.
Structure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to α-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the "structurally variable" regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of 'variable' regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure.  相似文献   

14.
Sarco(endo)plasmic reticulum Ca2+‐ATPase transports two Ca2+ per ATP‐hydrolyzed across biological membranes against a large concentration gradient by undergoing large conformational changes. Structural studies with X‐ray crystallography revealed functional roles of coupled motions between the cytoplasmic domains and the transmembrane helices in individual reaction steps. Here, we employed “Motion Tree (MT),” a tree diagram that describes a conformational change between two structures, and applied it to representative Ca2+‐ATPase structures. MT provides information of coupled rigid‐body motions of the ATPase in individual reaction steps. Fourteen rigid structural units, “common rigid domains (CRDs)” are identified from seven MTs throughout the whole enzymatic reaction cycle. CRDs likely act as not only the structural units, but also the functional units. Some of the functional importance has been newly revealed by the analysis. Stability of each CRD is examined on the morphing trajectories that cover seven conformational transitions. We confirmed that the large conformational changes are realized by the motions only in the flexible regions that connect CRDs. The Ca2+‐ATPase efficiently utilizes its intrinsic flexibility and rigidity to response different switches like ligand binding/dissociation or ATP hydrolysis. The analysis detects functional motions without extensive biological knowledge of experts, suggesting its general applicability to domain movements in other membrane proteins to deepen the understanding of protein structure and function. Proteins 2015; 83:746–756. © 2015 Wiley Periodicals, Inc.  相似文献   

15.
Analysis of protein structures based on backbone structural patterns known as structural alphabets have been shown to be very useful. Among them, a set of 16 pentapeptide structural motifs known as protein blocks (PBs) has been identified and upon which backbone model of most protein structures can be built. PBs allows simplification of 3D space onto 1D space in the form of sequence of PBs. Here, for the first time, substitution probabilities of PBs in a large number of aligned homologous protein structures have been studied and are expressed as a simplified 16 x 16 substitution matrix. The matrix was validated by benchmarking how well it can align sequences of PBs rather like amino acid alignment to identify structurally equivalent regions in closely or distantly related proteins using dynamic programming approach. The alignment results obtained are very comparable to well established structure comparison methods like DALI and STAMP. Other interesting applications of the matrix have been investigated. We first show that, in variable regions between two superimposed homologous proteins, one can distinguish between local conformational differences and rigid-body displacement of a conserved motif by comparing the PBs and their substitution scores. Second, we demonstrate, with the example of aspartic proteinases, that PBs can be efficiently used to detect the lobe/domain flexibility in the multidomain proteins. Lastly, using protein kinase as an example, we identify regions of conformational variations and rigid body movements in the enzyme as it is changed to the active state from an inactive state.  相似文献   

16.
Vorolign, a fast and flexible structural alignment method for two or more protein structures is introduced. The method aligns protein structures using double dynamic programming and measures the similarity of two residues based on the evolutionary conservation of their corresponding Voronoi-contacts in the protein structure. This similarity function allows aligning protein structures even in cases where structural flexibilities exist. Multiple structural alignments are generated from a set of pairwise alignments using a consistency-based, progressive multiple alignment strategy. RESULTS: The performance of Vorolign is evaluated for different applications of protein structure comparison, including automatic family detection as well as pairwise and multiple structure alignment. Vorolign accurately detects the correct family, superfamily or fold of a protein with respect to the SCOP classification on a set of difficult target structures. A scan against a database of >4000 proteins takes on average 1 min per target. The performance of Vorolign in calculating pairwise and multiple alignments is found to be comparable with other pairwise and multiple protein structure alignment methods. AVAILABILITY: Vorolign is freely available for academic users as a web server at http://www.bio.ifi.lmu.de/Vorolign  相似文献   

17.
A fundamental role of the Hsp90 chaperone in regulating functional activity of diverse protein clients is essential for the integrity of signaling networks. In this work we have combined biophysical simulations of the Hsp90 crystal structures with the protein structure network analysis to characterize the statistical ensemble of allosteric interaction networks and communication pathways in the Hsp90 chaperones. We have found that principal structurally stable communities could be preserved during dynamic changes in the conformational ensemble. The dominant contribution of the inter-domain rigidity to the interaction networks has emerged as a common factor responsible for the thermodynamic stability of the active chaperone form during the ATPase cycle. Structural stability analysis using force constant profiling of the inter-residue fluctuation distances has identified a network of conserved structurally rigid residues that could serve as global mediating sites of allosteric communication. Mapping of the conformational landscape with the network centrality parameters has demonstrated that stable communities and mediating residues may act concertedly with the shifts in the conformational equilibrium and could describe the majority of functionally significant chaperone residues. The network analysis has revealed a relationship between structural stability, global centrality and functional significance of hotspot residues involved in chaperone regulation. We have found that allosteric interactions in the Hsp90 chaperone may be mediated by modules of structurally stable residues that display high betweenness in the global interaction network. The results of this study have suggested that allosteric interactions in the Hsp90 chaperone may operate via a mechanism that combines rapid and efficient communication by a single optimal pathway of structurally rigid residues and more robust signal transmission using an ensemble of suboptimal multiple communication routes. This may be a universal requirement encoded in protein structures to balance the inherent tension between resilience and efficiency of the residue interaction networks.  相似文献   

18.
The concept of a flexible protein sequence pattern is defined. In contrast to conventional pattern matching, template or sequence alignment methods, flexible patterns allow residue patterns typical of a complete protein fold to be developed in terms of residue positions (elements), separated by gaps of defined range. An efficient dynamic programming algorithm is presented to enable the best alignment(s) of a pattern with a sequence to be identified. The flexible pattern method is evaluated in detail by reference to the globin protein family, and by comparison to alignment techniques that exploit single sequence, multiple sequence and secondary structural information. A flexible pattern derived from seven globins aligned on structural criteria successfully discriminates all 345 globins from non-globins in the Protein Identification Resource database. Furthermore, a pattern that uses helical regions from just human alpha-haemoglobin identified 337 globins compared to 318 for the best non-pattern global alignment method. Patterns derived from successively fewer, yet more highly conserved positions in a structural alignment of seven globins show that as few as 38 residue positions (25 buried hydrophobic, 4 exposed and 9 others) may be used to uniquely identify the globin fold. The study suggests that flexible patterns gain discriminating power both by discarding regions known to vary within the protein family, and by defining gaps within specific ranges. Flexible patterns therefore provide a convenient and powerful bridge between regular expression pattern matching techniques and more conventional local and global sequence comparison algorithms.  相似文献   

19.
MUSTANG: a multiple structural alignment algorithm   总被引:1,自引:0,他引:1  
Multiple structural alignment is a fundamental problem in structural genomics. In this article, we define a reliable and robust algorithm, MUSTANG (MUltiple STructural AligNment AlGorithm), for the alignment of multiple protein structures. Given a set of protein structures, the program constructs a multiple alignment using the spatial information of the C(alpha) atoms in the set. Broadly based on the progressive pairwise heuristic, this algorithm gains accuracy through novel and effective refinement phases. MUSTANG reports the multiple sequence alignment and the corresponding superposition of structures. Alignments generated by MUSTANG are compared with several handcurated alignments in the literature as well as with the benchmark alignments of 1033 alignment families from the HOMSTRAD database. The performance of MUSTANG was compared with DALI at a pairwise level, and with other multiple structural alignment tools such as POSA, CE-MC, MALECON, and MultiProt. MUSTANG performs comparably to popular pairwise and multiple structural alignment tools for closely related proteins, and performs more reliably than other multiple structural alignment methods on hard data sets containing distantly related proteins or proteins that show conformational changes.  相似文献   

20.
Fast, efficient, and reliable algorithms for pairwise alignment of protein structures are in ever-increasing demand for analyzing the rapidly growing data on protein structures. CLePAPS is a tool developed for this purpose. It distinguishes itself from other existing algorithms by the use of conformational letters, which are discretized states of 3D segmental structural states. A letter corresponds to a cluster of combinations of the three angles formed by Calpha pseudobonds of four contiguous residues. A substitution matrix called CLESUM is available to measure the similarity between any two such letters. CLePAPS regards an aligned fragment pair (AFP) as an ungapped string pair with a high sum of pairwise CLESUM scores. Using CLESUM scores as the similarity measure, CLePAPS searches for AFPs by simple string comparison. The transformation which best superimposes a highly similar AFP can be used to superimpose the structure pairs under comparison. A highly scored AFP which is consistent with several other AFPs determines an initial alignment. CLePAPS then joins consistent AFPs guided by their similarity scores to extend the alignment by several "zoom-in" iteration steps. A follow-up refinement produces the final alignment. CLePAPS does not implement dynamic programming. The utility of CLePAPS is tested on various protein structure pairs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号