首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
MOTIVATION: What constitutes a baseline level of success for protein fold recognition methods? As fold recognition benchmarks are often presented without any thought to the results that might be expected from a purely random set of predictions, an analysis of fold recognition baselines is long overdue. Given varying amounts of basic information about a protein-ranging from the length of the sequence to a knowledge of its secondary structure-to what extent can the fold be determined by intelligent guesswork? Can simple methods that make use of secondary structure information assign folds more accurately than purely random methods and could these methods be used to construct viable hierarchical classifications? EXPERIMENTS PERFORMED: A number of rapid automatic methods which score similarities between protein domains were devised and tested. These methods ranged from those that incorporated no secondary structure information, such as measuring absolute differences in sequence lengths, to more complex alignments of secondary structure elements. Each method was assessed for accuracy by comparison with the Class Architecture Topology Homology (CATH) classification. Methods were rated against both a random baseline fold assignment method as a lower control and FSSP as an upper control. Similarity trees were constructed in order to evaluate the accuracy of optimum methods at producing a classification of structure. RESULTS: Using a rigorous comparison of methods with CATH, the random fold assignment method set a lower baseline of 11% true positives allowing for 3% false positives and FSSP set an upper benchmark of 47% true positives at 3% false positives. The optimum secondary structure alignment method used here achieved 27% true positives at 3% false positives. Using a less rigorous Critical Assessment of Structure Prediction (CASP)-like sensitivity measurement the random assignment achieved 6%, FSSP-59% and the optimum secondary structure alignment method-32%. Similarity trees produced by the optimum method illustrate that these methods cannot be used alone to produce a viable protein structural classification system. CONCLUSIONS: Simple methods that use perfect secondary structure information to assign folds cannot produce an accurate protein taxonomy, however they do provide useful baselines for fold recognition. In terms of a typical CASP assessment our results suggest that approximately 6% of targets with folds in the databases could be assigned correctly by randomly guessing, and as many as 32% could be recognised by trivial secondary structure comparison methods, given knowledge of their correct secondary structures.  相似文献   

2.
Wood MJ  Hirst JD 《Proteins》2005,59(3):476-481
We present DESTRUCT, a new method of protein secondary structure prediction, which achieves a three-state accuracy (Q3) of 79.4% in a cross-validated trial on a nonredundant set of 513 proteins. An iterative set of cascade-correlation neural networks is used to predict both secondary structure and psi dihedral angles, with predicted values enhancing the subsequent iteration. Predictive accuracies of 80.7% and 81.7% are achieved on the CASP4 and CASP5 targets, respectively. Our approach is significantly more accurate than other contemporary methods, due to feedback and a novel combination of structural representations.  相似文献   

3.
A method is described to construct sets of decoy models that can be used to generate a background score distribution for protein structure comparison. The models are derived directly from the two proteins being compared and retain all the essential properties of the structures, including length, density, shape and secondary structure composition but have different folds. As each comparison involves a pair of proteins of the same length, no explicit normalisation is required to adjust for the length of the proteins being compared. This allows substructure (or domain) matches to score almost equally to the comparison of isolated domains. A normalised probability measure was derived that allows joint family/family comparison. The method was applied to some of the CASP6 models for targets with new folds.  相似文献   

4.
We present a knowledge‐based function to score protein decoys based on their similarity to native structure. A set of features is constructed to describe the structure and sequence of the entire protein chain. Furthermore, a qualitative relationship is established between the calculated features and the underlying electromagnetic interaction that dominates this scale. The features we use are associated with residue–residue distances, residue–solvent distances, pairwise knowledge‐based potentials and a four‐body potential. In addition, we introduce a new target to be predicted, the fitness score, which measures the similarity of a model to the native structure. This new approach enables us to obtain information both from decoys and from native structures. It is also devoid of previous problems associated with knowledge‐based potentials. These features were obtained for a large set of native and decoy structures and a back‐propagating neural network was trained to predict the fitness score. Overall this new scoring potential proved to be superior to the knowledge‐based scoring functions used as its inputs. In particular, in the latest CASP (CASP10) experiment our method was ranked third for all targets, and second for freely modeled hard targets among about 200 groups for top model prediction. Ours was the only method ranked in the top three for all targets and for hard targets. This shows that initial results from the novel approach are able to capture details that were missed by a broad spectrum of protein structure prediction approaches. Source codes and executable from this work are freely available at http://mathmed.org /#Software and http://mamiris.com/ . Proteins 2014; 82:752–759. © 2013 Wiley Periodicals, Inc.  相似文献   

5.
1 Introduction The prediction of protein structure and function from amino acid sequences is one of the most impor-tant problems in molecular biology. This problem is becoming more pressing as the number of known pro-tein sequences is explored as a result of genome and other sequencing projects, and the protein sequence- structure gap is widening rapidly[1]. Therefore, com-putational tools to predict protein structures are needed to narrow the widening gap. Although the prediction of three dim…  相似文献   

6.
A novel method for predicting the secondary structures of proteins from amino acid sequence has been presented. The protein secondary structure seqlets that are analogous to the words in natural language have been extracted. These seqlets will capture the relationship between amino acid sequence and the secondary structures of proteins and further form the protein secondary structure dictionary. To be elaborate, the dictionary is organism-specific. Protein secondary structure prediction is formulated as an integrated word segmentation and part of speech tagging problem. The word-lattice is used to represent the results of the word segmentation and the maximum entropy model is used to calculate the probability of a seqlet tagged as a certain secondary structure type. The method is markovian in the seqlets, permitting efficient exact calculation of the posterior probability distribution over all possible word segmentations and their tags by viterbi algorithm. The optimal segmentations and their tags are computed as the results of protein secondary structure prediction. The method is applied to predict the secondary structures of proteins of four organisms respectively and compared with the PHD method. The results show that the performance of this method is higher than that of PHD by about 3.9% Q3 accuracy and 4.6% SOV accuracy. Combining with the local similarity protein sequences that are obtained by BLAST can give better prediction. The method is also tested on the 50 CASP5 target proteins with Q3 accuracy 78.9% and SOV accuracy 77.1%. A web server for protein secondary structure prediction has been constructed which is available at http://www.insun.hit.edu.cn:81/demos/biology/index.html.  相似文献   

7.
We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.  相似文献   

8.
We present a protein fold recognition method, MANIFOLD, which uses the similarity between target and template proteins in predicted secondary structure, sequence and enzyme code to predict the fold of the target protein. We developed a non-linear ranking scheme in order to combine the scores of the three different similarity measures used. For a difficult test set of proteins with very little sequence similarity, the program predicts the fold class correctly in 34% of cases. This is an over twofold increase in accuracy compared with sequence-based methods such as PSI-BLAST or GenTHREADER, which score 13-14% correct first hits for the same test set. The functional similarity term increases the prediction accuracy by up to 3% compared with using the combination of secondary structure similarity and PSI-BLAST alone. We argue that using functional and secondary structure information can increase the fold recognition beyond sequence similarity.  相似文献   

9.
Pei J  Grishin NV 《Proteins》2004,56(4):782-794
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.  相似文献   

10.
MOTIVATION: Evaluating the accuracy of predicted models is critical for assessing structure prediction methods. Because this problem is not trivial, a large number of different assessment measures have been proposed by various authors, and it has already become an active subfield of research (Moult et al. (1997,1999) and CAFASP (Fischer et al. 1999) prediction experiments have demonstrated that it has been difficult to choose one single, 'best' method to be used in the evaluation. Consequently, the CASP3 evaluation was carried out using an extensive set of especially developed numerical measures, coupled with human-expert intervention. As part of our efforts towards a higher level of automation in the structure prediction field, here we investigate the suitability of a fully automated, simple, objective, quantitative and reproducible method that can be used in the automatic assessment of models in the upcoming CAFASP2 experiment. Such a method should (a) produce one single number that measures the quality of a predicted model and (b) perform similarly to human-expert evaluations. RESULTS: MaxSub is a new and independently developed method that further builds and extends some of the evaluation methods introduced at CASP3. MaxSub aims at identifying the largest subset of C(alpha) atoms of a model that superimpose 'well' over the experimental structure, and produces a single normalized score that represents the quality of the model. Because there exists no evaluation method for assessment measures of predicted models, it is not easy to evaluate how good our new measure is. Even though an exact comparison of MaxSub and the CASP3 assessment is not straightforward, here we use a test-bed extracted from the CASP3 fold-recognition models. A rough qualitative comparison of the performance of MaxSub vis-a-vis the human-expert assessment carried out at CASP3 shows that there is a good agreement for the more accurate models and for the better predicting groups. As expected, some differences were observed among the medium to poor models and groups. Overall, the top six predicting groups ranked using the fully automated MaxSub are also the top six groups ranked at CASP3. We conclude that MaxSub is a suitable method for the automatic evaluation of models.  相似文献   

11.
An Y  Friesner RA 《Proteins》2002,48(2):352-366
In this work, we introduce a new method for fold recognition using composite secondary structures assembled from different secondary structure prediction servers for a given target sequence. An automatic, complete, and robust way of finding all possible combinations of predicted secondary structure segments (SSS) for the target sequence and clustering them into a few flexible clusters, each containing patterns with the same number of SSS, is developed. This program then takes two steps in choosing plausible homologues: (i) a SSS-based alignment excludes impossible templates whose SSS patterns are very different from any of those of the target; (ii) a residue-based alignment selects good structural templates based on sequence similarity and secondary structure similarity between the target and only those templates left in the first stage. The secondary structure of each residue in the target is selected from one of the predictions to find the best match with the template. Truncation is applied to a target where different predictions vary. In most cases, a target is also divided into N-terminal and C-terminal fragments, each of which is used as a separate subsequence. Our program was tested on the fold recognition targets from CASP3 with known PDB codes and some available targets from CASP4. The results are compared with a structural homologue list for each target produced by the CE program (Shindyalov and Bourne, Protein Eng 1998;11:739-747). The program successfully locates homologues with high Z-score and low root-mean-score deviation within the top 30-50 predictions in the overwhelming majority of cases.  相似文献   

12.
MOTIVATION: Order and Disorder prediction using Conditional Random Fields (OnD-CRF) is a new method for accurately predicting the transition between structured and mobile or disordered regions in proteins. OnD-CRF applies CRFs relying on features which are generated from the amino acids sequence and from secondary structure prediction. Benchmarking results based on CASP7 targets, and evaluation with respect to several CASP criteria, rank the OnD-CRF model highest among the fully automatic server group. AVAILABILITY: http://babel.ucmp.umu.se/ond-crf/  相似文献   

13.
Protein eight-state secondary structure prediction is challenging, but is necessary to determine protein structure and function. Here, we report the development of a novel approach, SPSSM8, to predict eight-state secondary structures of proteins accurately from sequences based on the structural position-specific scoring matrix (SPSSM). The SPSSM has been successfully utilized to predict three-state secondary structures. Now we employ an eight-state SPSSM as a feature that is obtained from sequence structure alignment against a large database of 9 million sequences with putative structural information. The SPSSM8 uses a low sequence identity dataset (9062 entries) as a training set and conditional random field for the classification algorithm. The SPSSM8 achieved an average eight-state secondary structure accuracy (Q8) of 71.7% (Q3, 81.6%) for an independent testing set (463 entries), which had an improved accuracy of 10.1% and 4.6% compared with SSPro8 and CNF, respectively, and significantly improved the accuracy of eight-state secondary structure prediction. For CASP 9 dataset (92 entries) the SPSSM8 achieved a Q8 accuracy of 80.1% (Q3, 83.0%). The SPSSM8 was confirmed as an outstanding predictor for eight-state secondary structures of proteins. SPSSM8 is freely available at http://cal.tongji.edu.cn/SPSSM8.  相似文献   

14.
MOTIVATION: The Monte Carlo fragment insertion method for protein tertiary structure prediction (ROSETTA) of Baker and others, has been merged with the I-SITES library of sequence structure motifs and the HMMSTR model for local structure in proteins, to form a new public server for the ab initio prediction of protein structure. The server performs several tasks in addition to tertiary structure prediction, including a database search, amino acid profile generation, fragment structure prediction, and backbone angle and secondary structure prediction. Meeting reasonable service goals required improvements in the efficiency, in particular for the ROSETTA algorithm. RESULTS: The new server was used for blind predictions of 40 protein sequences as part of the CASP4 blind structure prediction experiment. The results for 31 of those predictions are presented here. 61% of the residues overall were found in topologically correct predictions, which are defined as fragments of 30 residues or more with a root-mean-square deviation in superimposed alpha carbons of less than 6A. HMMSTR 3-state secondary structure predictions were 73% correct overall. Tertiary structure predictions did not improve the accuracy of secondary structure prediction.  相似文献   

15.
Is it better to combine predictions?   总被引:2,自引:0,他引:2  
We have compared the accuracy of the individual protein secondary structure prediction methods: PHD, DSC, NNSSP and Predator against the accuracy obtained by combing the predictions of the methods. A range of ways of combing predictions were tested: voting, biased voting, linear discrimination, neural networks and decision trees. The combined methods that involve 'learning' (the non-voting methods) were trained using a set of 496 non-homologous domains; this dataset was biased as some of the secondary structure prediction methods had used them for training. We used two independent test sets to compare predictions: the first consisted of 17 non-homologous domains from CASP3 (Third Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction); the second set consisted of 405 domains that were selected in the same way as the training set, and were non-homologous to each other and the training set. On both test datasets the most accurate individual method was NNSSP, then PHD, DSC and the least accurate was Predator; however, it was not possible to conclusively show a significant difference between the individual methods. Comparing the accuracy of the single methods with that obtained by combing predictions it was found that it was better to use a combination of predictions. On both test datasets it was possible to obtain a approximately 3% improvement in accuracy by combing predictions. In most cases the combined methods were statistically significantly better (at P = 0.05 on the CASP3 test set, and P = 0.01 on the EBI test set). On the CASP3 test dataset there was no significant difference in accuracy between any of the combined method of prediction: on the EBI test dataset, linear discrimination and neural networks significantly outperformed voting techniques. We conclude that it is better to combine predictions.  相似文献   

16.
Kifer I  Nussinov R  Wolfson HJ 《Proteins》2008,73(2):380-394
How a one-dimensional protein sequence folds into a specific 3D structure remains a difficult challenge in structural biology. Many computational methods have been developed in an attempt to predict the tertiary structure of the protein; most of these employ approaches that are based on the accumulated knowledge of solved protein structures. Here we introduce a novel and fully automated approach for predicting the 3D structure of a protein that is based on the well accepted notion that protein folding is a hierarchical process. Our algorithm follows the hierarchical model by employing two stages: the first aims to find a match between the sequences of short independently-folding structural entities and parts of the target sequence and assigns the respective structures. The second assembles these local structural parts into a complete 3D structure, allowing for long-range interactions between them. We present the results of applying our method to a subset of the targets from CASP6 and CASP7. Our results indicate that for targets with a significant sequence similarity to known structures we are often able to provide predictions that are better than those achieved by two leading servers, and that the most significant improvements in comparison with these methods occur in regions of a gapped structural alignment between the native structure and the closest available structural template. We conclude that in addition to performing well for targets with known homologous structures, our method shows great promise for addressing the more general category of comparative modeling targets, which is our next goal.  相似文献   

17.
Jinbo Xu  Sheng Wang 《Proteins》2019,87(12):1069-1081
This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.  相似文献   

18.
Performance in the template-based modeling (TBM) category of CASP13 is assessed here, using a variety of metrics. Performance of the predictor groups that participated is ranked using the primary ranking score that was developed by the assessors for CASP12. This reveals that the best results are obtained by groups that include contact predictions or inter-residue distance predictions derived from deep multiple sequence alignments. In cases where there is a good homolog in the wwPDB (TBM-easy category), the best results are obtained by modifying a template. However, for cases with poorer homologs (TBM-hard), very good results can be obtained without using an explicit template, by deep learning algorithms trained on the wwPDB. Alternative metrics are introduced, to allow testing of aspects of structural models that are not addressed by traditional CASP metrics. These include comparisons to the main-chain and side-chain torsion angles of the target, and the utility of models for solving crystal structures by the molecular replacement method. The alternative metrics are poorly correlated with the traditional metrics, and it is proposed that modeling has reached a sufficient level of maturity that the best models should be expected to satisfy this wider range of criteria.  相似文献   

19.
Protein fold recognition using sequence-derived predictions.   总被引:18,自引:9,他引:9       下载免费PDF全文
In protein fold recognition, one assigns a probe amino acid sequence of unknown structure to one of a library of target 3D structures. Correct assignment depends on effective scoring of the probe sequence for its compatibility with each of the target structures. Here we show that, in addition to the amino acid sequence of the probe, sequence-derived properties of the probe sequence (such as the predicted secondary structure) are useful in fold assignment. The additional measure of compatibility between probe and target is the level of agreement between the predicted secondary structure of the probe and the known secondary structure of the target fold. That is, we recommend a sequence-structure compatibility function that combines previously developed compatibility functions (such as the 3D-1D scores of Bowie et al. [1991] or sequence-sequence replacement tables) with the predicted secondary structure of the probe sequence. The effect on fold assignment of adding predicted secondary structure is evaluated here by using a benchmark set of proteins (Fischer et al., 1996a). The 3D structures of the probe sequences of the benchmark are actually known, but are ignored by our method. The results show that the inclusion of the predicted secondary structure improves fold assignment by about 25%. The results also show that, if the true secondary structure of the probe were known, correct fold assignment would increase by an additional 8-32%. We conclude that incorporating sequence-derived predictions significantly improves assignment of sequences to known 3D folds. Finally, we apply the new method to assign folds to sequences in the SWISSPROT database; six fold assignments are given that are not detectable by standard sequence-sequence comparison methods; for two of these, the fold is known from X-ray crystallography and the fold assignment is correct.  相似文献   

20.

Background  

A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号