首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.

Background

Protein sequence profile-profile alignment is an important approach to recognizing remote homologs and generating accurate pairwise alignments. It plays an important role in protein sequence database search, protein structure prediction, protein function prediction, and phylogenetic analysis.

Results

In this work, we integrate predicted solvent accessibility, torsion angles and evolutionary residue coupling information with the pairwise Hidden Markov Model (HMM) based profile alignment method to improve profile-profile alignments. The evaluation results demonstrate that adding predicted relative solvent accessibility and torsion angle information improves the accuracy of profile-profile alignments. The evolutionary residue coupling information is helpful in some cases, but its contribution to the improvement is not consistent.

Conclusion

Incorporating the new structural information such as predicted solvent accessibility and torsion angles into the profile-profile alignment is a useful way to improve pairwise profile-profile alignment methods.  相似文献   

2.
Protein backbone angle prediction with machine learning approaches   总被引:2,自引:0,他引:2  
MOTIVATION: Protein backbone torsion angle prediction provides useful local structural information that goes beyond conventional three-state (alpha, beta and coil) secondary structure predictions. Accurate prediction of protein backbone torsion angles will substantially improve modeling procedures for local structures of protein sequence segments, especially in modeling loop conformations that do not form regular structures as in alpha-helices or beta-strands. RESULTS: We have devised two novel automated methods in protein backbone conformational state prediction: one method is based on support vector machines (SVMs); the other method combines a standard feed-forward back-propagation artificial neural network (NN) with a local structure-based sequence profile database (LSBSP1). Extensive benchmark experiments demonstrate that both methods have improved the prediction accuracy rate over the previously published methods for conformation state prediction when using an alphabet of three or four states. AVAILABILITY: LSBSP1 and the NN algorithm have been implemented in PrISM.1, which is available from www.columbia.edu/~ay1/. SUPPLEMENTARY INFORMATION: Supplementary data for the SVM method can be downloaded from the Website www.cs.columbia.edu/compbio/backbone.  相似文献   

3.
Finding structural similarities between proteins often helps reveal shared functionality, which otherwise might not be detected by native sequence information alone. Such similarity is usually detected and quantified by protein structure alignment. Determining the optimal alignment between two protein structures, however, remains a hard problem. An alternative approach is to approximate each three-dimensional protein structure using a sequence of motifs derived from a structural alphabet. Using this approach, structure comparison is performed by comparing the corresponding motif sequences or structural sequences. In this article, we measure the performance of such alphabets in the context of the protein structure classification problem. We consider both local and global structural sequences. Each letter of a local structural sequence corresponds to the best matching fragment to the corresponding local segment of the protein structure. The global structural sequence is designed to generate the best possible complete chain that matches the full protein structure. We use an alphabet of 20 letters, corresponding to a library of 20 motifs or protein fragments having four residues. We show that the global structural sequences approximate well the native structures of proteins, with an average coordinate root mean square of 0.69 Å over 2225 test proteins. The approximation is best for all α-proteins, while relatively poorer for all β-proteins. We then test the performance of four different sequence representations of proteins (their native sequence, the sequence of their secondary-structure elements, and the local and global structural sequences based on our fragment library) with different classifiers in their ability to classify proteins that belong to five distinct folds of CATH. Without surprise, the primary sequence alone performs poorly as a structure classifier. We show that addition of either secondary-structure information or local information from the structural sequence considerably improves the classification accuracy. The two fragment-based sequences perform better than the secondary-structure sequence but not well enough at this stage to be a viable alternative to more computationally intensive methods based on protein structure alignment.  相似文献   

4.
5.
We present a comprehensive evaluation of a new structure mining method called PB-ALIGN. It is based on the encoding of protein structure as 1D sequence of a combination of 16 short structural motifs or protein blocks (PBs). PBs are short motifs capable of representing most of the local structural features of a protein backbone. Using derived PB substitution matrix and simple dynamic programming algorithm, PB sequences are aligned the same way amino acid sequences to yield structure alignment. PBs are short motifs capable of representing most of the local structural features of a protein backbone. Alignment of these local features as sequence of symbols enables fast detection of structural similarities between two proteins. Ability of the method to characterize and align regions beyond regular secondary structures, for example, N and C caps of helix and loops connecting regular structures, puts it a step ahead of existing methods, which strongly rely on secondary structure elements. PB-ALIGN achieved efficiency of 85% in extracting true fold from a large database of 7259 SCOP domains and was successful in 82% cases to identify true super-family members. On comparison to 13 existing structure comparison/mining methods, PB-ALIGN emerged as the best on general ability test dataset and was at par with methods like YAKUSA and CE on nontrivial test dataset. Furthermore, the proposed method performed well when compared to flexible structure alignment method like FATCAT and outperforms in processing speed (less than 45 s per database scan). This work also establishes a reliable cut-off value for the demarcation of similar folds. It finally shows that global alignment scores of unrelated structures using PBs follow an extreme value distribution. PB-ALIGN is freely available on web server called Protein Block Expert (PBE) at http://bioinformatics.univ-reunion.fr/PBE/.  相似文献   

6.
Song J  Tan H  Wang M  Webb GI  Akutsu T 《PloS one》2012,7(2):e30361
Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the C(α)-N bond (Phi) and the C(α)-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/.  相似文献   

7.
8.
Wu S  Zhang Y 《Proteins》2008,72(2):547-556
We develop a new threading algorithm MUSTER by extending the previous sequence profile-profile alignment method, PPA. It combines various sequence and structure information into single-body terms which can be conveniently used in dynamic programming search: (1) sequence profiles; (2) secondary structures; (3) structure fragment profiles; (4) solvent accessibility; (5) dihedral torsion angles; (6) hydrophobic scoring matrix. The balance of the weighting parameters is optimized by a grading search based on the average TM-score of 111 training proteins which shows a better performance than using the conventional optimization methods based on the PROSUP database. The algorithm is tested on 500 nonhomologous proteins independent of the training sets. After removing the homologous templates with a sequence identity to the target >30%, in 224 cases, the first template alignment has the correct topology with a TM-score >0.5. Even with a more stringent cutoff by removing the templates with a sequence identity >20% or detectable by PSI-BLAST with an E-value <0.05, MUSTER is able to identify correct folds in 137 cases with the first model of TM-score >0.5. Dependent on the homology cutoffs, the average TM-score of the first threading alignments by MUSTER is 5.1-6.3% higher than that by PPA. This improvement is statistically significant by the Wilcoxon signed rank test with a P-value < 1.0 x 10(-13), which demonstrates the effect of additional structural information on the protein fold recognition. The MUSTER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/MUSTER.  相似文献   

9.
MOTIVATION: Local structure segments (LSSs) are small structural units shared by unrelated proteins. They are extensively used in protein structure comparison, and predicted LSSs (PLSSs) are used very successfully in ab initio folding simulations. However, predicted or real LSSs are rarely exploited by protein sequence comparison programs that are based on position-by-position alignments. RESULTS: We developed a SEgment Alignment algorithm (SEA) to compare proteins described as a collection of predicted local structure segments (PLSSs), which is equivalent to an unweighted graph (network). Any specific structure, real or predicted corresponds to a specific path in this network. SEA then uses a network matching approach to find two most similar paths in networks representing two proteins. SEA explores the uncertainty and diversity of predicted local structure information to search for a globally optimal solution. It simultaneously solves two related problems: the alignment of two proteins and the local structure prediction for each of them. On a benchmark of protein pairs with low sequence similarity, we show that application of the SEA algorithm improves alignment quality as compared to FFAS profile-profile alignment, and in some cases SEA alignments can match the structural alignments, a feat previously impossible for any sequence based alignment methods.  相似文献   

10.
A holistic approach to protein structure alignment   总被引:4,自引:0,他引:4  
A method of protein structure comparison developed previously is extended to incorporate other aspects of protein structure in addition to the inter-atomic vectors on which it was originally based. Each additional aspect, which induced hydrogen bonding, solvent exposure, torsional angles and sequence, was introduced separately and evaluated for its ability to improve alignment quality. The components were then combined, suitably weighted, to produce a more holistic comparison method. The method was tested on a group of remotely related beta/alpha type proteins that share a common feature in their overall chain fold. The results indicated that while the original inter-atomic vector component was sufficient to give the correct alignment of most pairs of topologically equivalent proteins, the inclusion of hydrogen bonds, torsion angles and a measure of solvent exposure led to improvements in the more difficult comparisons. Consideration of amino acid properties, including hydrophobicity, had no beneficial effect. The failure of the latter component was not unexpected considering the almost total lack of sequence similarity among the proteins considered.  相似文献   

11.
We present a method to assess the reliability of local structure prediction from sequence. We introduce a greedy algorithm for filtering and enrichment of dynamic fragment libraries, compiled with remote-homology detection methods such as HHfrag. After filtering false hits at each target position, we reduce the fragment library to a minimal set of representative fragments, which are guaranteed to have correct local structure in regions of detectable conservation. We demonstrate that the location of conserved motifs in a protein sequence can be predicted by examining the recurrence and structural homogeneity of detected fragments. The resulting confidence score correlates with the local RMSD of the representative fragments and allows us to predict torsion angles from sequence with better accuracy compared to existing machine learning methods.  相似文献   

12.
A specific treatment of recurrent structural motifs that represent the local bias information has been proven to be an important ingredient in de novo protein structure predication. Significant majority of methods for local structure are based on building blocks, which still suffer from its inherent discrete nature. Instead of using building blocks, this work presents a new protocol framework for local structural motifs prediction based on the direct locating along protein sequence and probabilistic sampling in a continuous (φ, ψ) space. The protein sequence was first scanned by an algorithm of sliding window with variable length of 7 to 19 residues, to match local segments to one of 82 motifs patterns in the fragment library. Identified segments were then labeled and modeled as the correlations of backbone torsion angles with mixture of bivariate cosine distributions in continuous (φ, ψ) space. 3D conformations of corresponding segments were finally sampled by using a backtrack algorithm to the hidden Markov model with single output of (φ, ψ). For local motifs in 50 proteins of testing set, about 62% of eight-residue segments located with high confidence value were predicted within 1.5 ? of their native structures by the method. Majority of local structural motifs were identified and sampled, which indicates the proposed protocol may at least serve as the foundation to obtain better protein tertiary structure prediction.  相似文献   

13.
Shatsky M  Nussinov R  Wolfson HJ 《Proteins》2006,62(1):209-217
Routinely used multiple-sequence alignment methods use only sequence information. Consequently, they may produce inaccurate alignments. Multiple-structure alignment methods, on the other hand, optimize structural alignment by ignoring sequence information. Here, we present an optimization method that unifies sequence and structure information. The alignment score is based on standard amino acid substitution probabilities combined with newly computed three-dimensional structure alignment probabilities. The advantage of our alignment scheme is in its ability to produce more accurate multiple alignments. We demonstrate the usefulness of the method in three applications: 1) computing more accurate multiple-sequence alignments, 2) analyzing protein conformational changes, and 3) computation of amino acid structure-sequence conservation with application to protein-protein docking prediction. The method is available at http://bioinfo3d.cs.tau.ac.il/staccato/.  相似文献   

14.
Kifer I  Nussinov R  Wolfson HJ 《Proteins》2008,73(2):380-394
How a one-dimensional protein sequence folds into a specific 3D structure remains a difficult challenge in structural biology. Many computational methods have been developed in an attempt to predict the tertiary structure of the protein; most of these employ approaches that are based on the accumulated knowledge of solved protein structures. Here we introduce a novel and fully automated approach for predicting the 3D structure of a protein that is based on the well accepted notion that protein folding is a hierarchical process. Our algorithm follows the hierarchical model by employing two stages: the first aims to find a match between the sequences of short independently-folding structural entities and parts of the target sequence and assigns the respective structures. The second assembles these local structural parts into a complete 3D structure, allowing for long-range interactions between them. We present the results of applying our method to a subset of the targets from CASP6 and CASP7. Our results indicate that for targets with a significant sequence similarity to known structures we are often able to provide predictions that are better than those achieved by two leading servers, and that the most significant improvements in comparison with these methods occur in regions of a gapped structural alignment between the native structure and the closest available structural template. We conclude that in addition to performing well for targets with known homologous structures, our method shows great promise for addressing the more general category of comparative modeling targets, which is our next goal.  相似文献   

15.
MOTIVATION: Searching for non-coding RNA (ncRNA) genes and structural RNA elements (eleRNA) are major challenges in gene finding today as these often are conserved in structure rather than in sequence. Even though the number of available methods is growing, it is still of interest to pairwise detect two genes with low sequence similarity, where the genes are part of a larger genomic region. RESULTS: Here we present such an approach for pairwise local alignment which is based on foldalign and the Sankoff algorithm for simultaneous structural alignment of multiple sequences. We include the ability to conduct mutual scans of two sequences of arbitrary length while searching for common local structural motifs of some maximum length. This drastically reduces the complexity of the algorithm. The scoring scheme includes structural parameters corresponding to those available for free energy as well as for substitution matrices similar to RIBOSUM. The new foldalign implementation is tested on a dataset where the ncRNAs and eleRNAs have sequence similarity <40% and where the ncRNAs and eleRNAs are energetically indistinguishable from the surrounding genomic sequence context. The method is tested in two ways: (1) its ability to find the common structure between the genes only and (2) its ability to locate ncRNAs and eleRNAs in a genomic context. In case (1), it makes sense to compare with methods like Dynalign, and the performances are very similar, but foldalign is substantially faster. The structure prediction performance for a family is typically around 0.7 using Matthews correlation coefficient. In case (2), the algorithm is successful at locating RNA families with an average sensitivity of 0.8 and a positive predictive value of 0.9 using a BLAST-like hit selection scheme. AVAILABILITY: The program is available online at http://foldalign.kvl.dk/  相似文献   

16.
Homaeian L  Kurgan LA  Ruan J  Cios KJ  Chen K 《Proteins》2007,69(3):486-498
Secondary protein structure carries information about local structural arrangements, which include three major conformations: alpha-helices, beta-strands, and coils. Significant majority of successful methods for prediction of the secondary structure is based on multiple sequence alignment. However, multiple alignment fails to provide accurate results when a sequence comes from the twilight zone, that is, it is characterized by low (<30%) homology. To this end, we propose a novel method for prediction of secondary structure content through comprehensive sequence representation, called PSSC-core. The method uses a multiple linear regression model and introduces a comprehensive feature-based sequence representation to predict amount of helices and strands for sequences from the twilight zone. The PSSC-core method was tested and compared with two other state-of-the-art prediction methods on a set of 2187 twilight zone sequences. The results indicate that our method provides better predictions for both helix and strand content. The PSSC-core is shown to provide statistically significantly better results when compared with the competing methods, reducing the prediction error by 5-7% for helix and 7-9% for strand content predictions. The proposed feature-based sequence representation uses a comprehensive set of physicochemical properties that are custom-designed for each of the helix and strand content predictions. It includes composition and composition moment vectors, frequency of tetra-peptides associated with helical and strand conformations, various property-based groups like exchange groups, chemical groups of the side chains and hydrophobic group, auto-correlations based on hydrophobicity, side-chain masses, hydropathy, and conformational patterns for beta-sheets. The PSSC-core method provides an alternative for predicting the secondary structure content that can be used to validate and constrain results of other structure prediction methods. At the same time, it also provides useful insight into design of successful protein sequence representations that can be used in developing new methods related to prediction of different aspects of the secondary protein structure.  相似文献   

17.
Sequence comparison is a major step in the prediction of protein structure from existing templates in the Protein Data Bank. The identification of potentially remote homologues to be used as templates for modeling target sequences of unknown structure and their accurate alignment remain challenges, despite many years of study. The most recent advances have been in combining as many sources of information as possible--including amino acid variation in the form of profiles or hidden Markov models for both the target and template families, known and predicted secondary structures of the template and target, respectively, the combination of structure alignment for distant homologues and sequence alignment for close homologues to build better profiles, and the anchoring of certain regions of the alignment based on existing biological data. Newer technologies have been applied to the problem, including the use of support vector machines to tackle the fold classification problem for a target sequence and the alignment of hidden Markov models. Finally, using the consensus of many fold recognition methods, whether based on profile-profile alignments, threading or other approaches, continues to be one of the most successful strategies for both recognition and alignment of remote homologues. Although there is still room for improvement in identification and alignment methods, additional progress may come from model building and refinement methods that can compensate for large structural changes between remotely related targets and templates, as well as for regions of misalignment.  相似文献   

18.
We describe a hidden Markov model, HMMSTR, for general protein sequence based on the I-sites library of sequence-structure motifs. Unlike the linear hidden Markov models used to model individual protein families, HMMSTR has a highly branched topology and captures recurrent local features of protein sequences and structures that transcend protein family boundaries. The model extends the I-sites library by describing the adjacencies of different sequence-structure motifs as observed in the protein database and, by representing overlapping motifs in a much more compact form, achieves a great reduction in parameters. The HMM attributes a considerably higher probability to coding sequence than does an equivalent dipeptide model, predicts secondary structure with an accuracy of 74.3 %, backbone torsion angles better than any previously reported method and the structural context of beta strands and turns with an accuracy that should be useful for tertiary structure prediction.  相似文献   

19.
Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template‐defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile‐based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa . Proteins 2015; 83:411–427. © 2014 Wiley Periodicals, Inc.  相似文献   

20.
Vorolign, a fast and flexible structural alignment method for two or more protein structures is introduced. The method aligns protein structures using double dynamic programming and measures the similarity of two residues based on the evolutionary conservation of their corresponding Voronoi-contacts in the protein structure. This similarity function allows aligning protein structures even in cases where structural flexibilities exist. Multiple structural alignments are generated from a set of pairwise alignments using a consistency-based, progressive multiple alignment strategy. RESULTS: The performance of Vorolign is evaluated for different applications of protein structure comparison, including automatic family detection as well as pairwise and multiple structure alignment. Vorolign accurately detects the correct family, superfamily or fold of a protein with respect to the SCOP classification on a set of difficult target structures. A scan against a database of >4000 proteins takes on average 1 min per target. The performance of Vorolign in calculating pairwise and multiple alignments is found to be comparable with other pairwise and multiple protein structure alignment methods. AVAILABILITY: Vorolign is freely available for academic users as a web server at http://www.bio.ifi.lmu.de/Vorolign  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号