首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Cuff JA  Barton GJ 《Proteins》1999,34(4):508-519
A new dataset of 396 protein domains is developed and used to evaluate the performance of the protein secondary structure prediction algorithms DSC, PHD, NNSSP, and PREDATOR. The maximum theoretical Q3 accuracy for combination of these methods is shown to be 78%. A simple consensus prediction on the 396 domains, with automatically generated multiple sequence alignments gives an average Q3 prediction accuracy of 72.9%. This is a 1% improvement over PHD, which was the best single method evaluated. Segment Overlap Accuracy (SOV) is 75.4% for the consensus method on the 396-protein set. The secondary structure definition method DSSP defines 8 states, but these are reduced by most authors to 3 for prediction. Application of the different published 8- to 3-state reduction methods shows variation of over 3% on apparent prediction accuracy. This suggests that care should be taken to compare methods by the same reduction method. Two new sequence datasets (CB513 and CB251) are derived which are suitable for cross-validation of secondary structure prediction methods without artifacts due to internal homology. A fully automatic World Wide Web service that predicts protein secondary structure by a combination of methods is available via http://barton.ebi.ac.uk/.  相似文献   

2.
Cuff JA  Barton GJ 《Proteins》2000,40(3):502-511
The effect of training a neural network secondary structure prediction algorithm with different types of multiple sequence alignment profiles derived from the same sequences, is shown to provide a range of accuracy from 70.5% to 76.4%. The best accuracy of 76.4% (standard deviation 8.4%), is 3.1% (Q(3)) and 4.4% (SOV2) better than the PHD algorithm run on the same set of 406 sequence non-redundant proteins that were not used to train either method. Residues predicted by the new method with a confidence value of 5 or greater, have an average Q(3) accuracy of 84%, and cover 68% of the residues. Relative solvent accessibility based on a two state model, for 25, 5, and 0% accessibility are predicted at 76.2, 79.8, and 86. 6% accuracy respectively. The source of the improvements obtained from training with different representations of the same alignment data are described in detail. The new Jnet prediction method resulting from this study is available in the Jpred secondary structure prediction server, and as a stand-alone computer program from: http://barton.ebi.ac.uk/. Proteins 2000;40:502-511.  相似文献   

3.

Background  

Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio.  相似文献   

4.
Contact order and ab initio protein structure prediction   总被引:1,自引:0,他引:1       下载免费PDF全文
Although much of the motivation for experimental studies of protein folding is to obtain insights for improving protein structure prediction, there has been relatively little connection between experimental protein folding studies and computational structural prediction work in recent years. In the present study, we show that the relationship between protein folding rates and the contact order (CO) of the native structure has implications for ab initio protein structure prediction. Rosetta ab initio folding simulations produce a dearth of high CO structures and an excess of low CO structures, as expected if the computer simulations mimic to some extent the actual folding process. Consistent with this, the majority of failures in ab initio prediction in the CASP4 (critical assessment of structure prediction) experiment involved high CO structures likely to fold much more slowly than the lower CO structures for which reasonable predictions were made. This bias against high CO structures can be partially alleviated by performing large numbers of additional simulations, selecting out the higher CO structures, and eliminating the very low CO structures; this leads to a modest improvement in prediction quality. More significant improvements in predictions for proteins with complex topologies may be possible following significant increases in high-performance computing power, which will be required for thoroughly sampling high CO conformations (high CO proteins can take six orders of magnitude longer to fold than low CO proteins). Importantly for such a strategy, simulations performed for high CO structures converge much less strongly than those for low CO structures, and hence, lack of simulation convergence can indicate the need for improved sampling of high CO conformations. The parallels between Rosetta simulations and folding in vivo may extend to misfolding: The very low CO structures that accumulate in Rosetta simulations consist primarily of local up-down beta-sheets that may resemble precursors to amyloid formation.  相似文献   

5.
目前蛋白质二级结构的预测准确率徘徊在75%左右,难以作进一步提高。本文通过统计学的方法,对蛋白质的冗余数据库进行了分析。并由此证明,目前影响预测准确率继续的真正原因是蛋白质数据库本身的系统误差,系统误差大约为25%。而该误差是由于实验条件的客观原因带来的。  相似文献   

6.
MOTIVATION: Predicting the secondary structure of a protein (alpha-helix, beta-sheet, coil) is an important step towards elucidating its three-dimensional structure, as well as its function. Presently, the best predictors are based on machine learning approaches, in particular neural network architectures with a fixed, and relatively short, input window of amino acids, centered at the prediction site. Although a fixed small window avoids overfitting problems, it does not permit capturing variable long-rang information. RESULTS: We introduce a family of novel architectures which can learn to make predictions based on variable ranges of dependencies. These architectures extend recurrent neural networks, introducing non-causal bidirectional dynamics to capture both upstream and downstream information. The prediction algorithm is completed by the use of mixtures of estimators that leverage evolutionary information, expressed in terms of multiple alignments, both at the input and output levels. While our system currently achieves an overall performance close to 76% correct prediction--at least comparable to the best existing systems--the main emphasis here is on the development of new algorithmic ideas. AVAILABILITY: The executable program for predicting protein secondary structure is available from the authors free of charge. CONTACT: pfbaldi@ics.uci.edu, gpollast@ics.uci.edu, brunak@cbs.dtu.dk, paolo@dsi.unifi.it.  相似文献   

7.
In this study we present an accurate secondary structure prediction procedure by using a query and related sequences. The most novel aspect of our approach is its reliance on local pairwise alignment of the sequence to be predicted with each related sequence rather than utilization of a multiple alignment. The residue-by-residue accuracy of the method is 75% in three structural states after jack-knife tests. The gain in prediction accuracy compared with the existing techniques, which are at best 72%, is achieved by secondary structure propensities based on both local and long-range effects, utilization of similar sequence information in the form of carefully selected pairwise alignment fragments, and reliance on a large collection of known protein primary structures. The method is especially appropriate for large-scale sequence analysis efforts such as genome characterization, where precise and significant multiple sequence alignments are not available or achievable. Proteins 27:329–335, 1997. © 1997 Wiley-Liss, Inc.  相似文献   

8.
The nucleotide sequence of the alanine racemase (EC 5.1.1.1) gene from a thermophile, Bacillus stearothermophilus, was determined by the dideoxy chain termination method with universal and synthetic site-specific primers. The amino acid sequence of the enzyme predicted from the nucleotide sequence was confirmed by peptide sequence information derived from the N-terminal amino acid residues and several tryptic fragments. The alanine racemase gene consists of 1158 base pairs encoding a protein of 386 amino acid residues; the molecular weight of the apoenzyme is estimated as 43,341. The racemase gene of B. stearothermophilus has a closely similar size (1158 vs 1167 base pairs) to that of the gene of a mesophile, B. subtilis, but shows a higher preference for codons ending in G or C. A comparison of the amino acid sequence with those of Bacillus subtilis and Salmonella typhimurium dadB and alr enzymes revealed overall sequence homologies of 31-54%, including an identical octapeptide bearing the pyridoxal 5'-phosphate binding site. Although the residues common in the four racemases are not continuously arrayed, these constitute distinct domains and their hydropathy profiles are very similar. The secondary structure of B. stearothermophilus alanine racemase was predicted from the results obtained by theoretical analysis and circular dichroism measurement.  相似文献   

9.
Prediction of transmembrane spans and secondary structure from the protein sequence is generally the first step in the structural characterization of (membrane) proteins. Preference of a stretch of amino acids in a protein to form secondary structure and being placed in the membrane are correlated. Nevertheless, current methods predict either secondary structure or individual transmembrane states. We introduce a method that simultaneously predicts the secondary structure and transmembrane spans from the protein sequence. This approach not only eliminates the necessity to create a consensus prediction from possibly contradicting outputs of several predictors but bears the potential to predict conformational switches, i.e., sequence regions that have a high probability to change for example from a coil conformation in solution to an α‐helical transmembrane state. An artificial neural network was trained on databases of 177 membrane proteins and 6048 soluble proteins. The output is a 3 × 3 dimensional probability matrix for each residue in the sequence that combines three secondary structure types (helix, strand, coil) and three environment types (membrane core, interface, solution). The prediction accuracies are 70.3% for nine possible states, 73.2% for three‐state secondary structure prediction, and 94.8% for three‐state transmembrane span prediction. These accuracies are comparable to state‐of‐the‐art predictors of secondary structure (e.g., Psipred) or transmembrane placement (e.g., OCTOPUS). The method is available as web server and for download at www.meilerlab.org . Proteins 2013; 81:1127–1140. © 2013 Wiley Periodicals, Inc.  相似文献   

10.
In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in /spl beta/-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/-wild/bsm.html.  相似文献   

11.
12.
Machine learning approach for the prediction of protein secondary structure   总被引:8,自引:0,他引:8  
PROMIS (protein machine induction system), a program for machine learning, was used to generalize rules that characterize the relationship between primary and secondary structure in globular proteins. These rules can be used to predict an unknown secondary structure from a known primary structure. The symbolic induction method used by PROMIS was specifically designed to produce rules that are meaningful in terms of chemical properties of the residues. The rules found were compared with existing knowledge of protein structure: some features of the rules were already recognized (e.g. amphipathic nature of alpha-helices). Other features are not understood, and are under investigation. The rules produced a prediction accuracy for three states (alpha-helix, beta-strand and coil) of 60% for all proteins, 73% for proteins of known alpha domain type, 62% for proteins of known beta domain type and 59% for proteins of known alpha/beta domain type. We conclude that machine learning is a useful tool in the examination of the large databases generated in molecular biology.  相似文献   

13.
Hybrid system for protein secondary structure prediction.   总被引:13,自引:0,他引:13  
We have developed a hybrid system to predict the secondary structures (alpha-helix, beta-sheet and coil) of proteins and achieved 66.4% accuracy, with correlation coefficients of C(coil) = 0.429, C alpha = 0.470 and C beta = 0.387. This system contains three subsystems ("experts"): a neural network module, a statistical module and a memory-based reasoning module. First, the three experts independently learn the mapping between amino acid sequences and secondary structures from the known protein structures, then a Combiner learns to combine automatically the outputs of the experts to make final predictions. The hybrid system was tested with 107 protein structures through k-way cross-validation. Its performance was better than each expert and all previously reported methods with greater than 0.99 statistical significance. It was observed that for 20% of the residues, all three experts produced the same but wrong predictions. This may suggest an upper bound on the accuracy of secondary structure predictions based on local information from the currently available protein structures, and indicate places where non-local interactions may play a dominant role in conformation. For 64% of the residues, at least two experts were the same and correct, which shows that the Combiner performed better than majority vote. For 77% of the residues, at least one expert was correct, thus there may still be room for improvement in this hybrid approach. Rigorous evaluation procedures were used in testing the hybrid system, and statistical significance measures were developed in analyzing the differences among different methods. When measured in terms of the number of secondary structures (rather than the number of residues) that were predicted correctly, the prediction produced by the hybrid system was also better than those of individual experts.  相似文献   

14.
15.
A pentapeptide-based method for protein secondary structure prediction   总被引:7,自引:0,他引:7  
We present a new method for protein secondary structure prediction, based on the recognition of well-defined pentapeptides, in a large databank. Using a databank of 635 protein chains, we obtained a success rate of 68.6%. We show that progress is achieved when the databank is enlarged, when the 20 amino acids are adequately grouped in 10 sets and when more pentapeptides are attributed one of the defined conformations, alpha-helices or beta-strands. The analysis of the model indicates that the essential variable is the number of pentapeptides of well-defined structure in the database. Our model is simple, does not rely on arbitrary parameters and allows the analysis in detail of the results of each chosen hypothesis.  相似文献   

16.
Pan XM 《Proteins》2001,43(3):256-259
In the present work, a novel method was proposed for prediction of secondary structure. Over a database of 396 proteins (CB396) with a three-state-defining secondary structure, this method with jackknife procedure achieved an accuracy of 68.8% and SOV score of 71.4% using single sequence and an accuracy of 73.7% and SOV score of 77.3% using multiple sequence alignments. Combination of this method with DSC, PHD, PREDATOR, and NNSSP gives Q3 = 76.2% and SOV = 79.8%.  相似文献   

17.
GOR V server for protein secondary structure prediction   总被引:3,自引:0,他引:3  
SUMMARY: We have created the GOR V web server for protein secondary structure prediction. The GOR V algorithm combines information theory, Bayesian statistics and evolutionary information. In its fifth version, the GOR method reached (with the full jack-knife procedure) an accuracy of prediction Q3 of 73.5%. Although GOR V has been among the most successful methods, its online unavailability has been a deterrent to its popularity. Here, we remedy this situation by creating the GOR V server.  相似文献   

18.
This paper proposes an efficient ensemble system to tackle the protein secondary structure prediction problem with neural networks as base classifiers. The experimental results show that the multi-layer system can lead to better results. When deploying more accurate classifiers, the higher accuracy of the ensemble system can be obtained.  相似文献   

19.
We have modified and improved the GOR algorithm for the protein secondary structure prediction by using the evolutionary information provided by multiple sequence alignments, adding triplet statistics, and optimizing various parameters. We have expanded the database used to include the 513 non-redundant domains collected recently by Cuff and Barton (Proteins 1999;34:508-519; Proteins 2000;40:502-511). We have introduced a variable size window that allowed us to include sequences as short as 20-30 residues. A significant improvement over the previous versions of GOR algorithm was obtained by combining the PSI-BLAST multiple sequence alignments with the GOR method. The new algorithm will form the basis for the future GOR V release on an online prediction server. The average accuracy of the prediction of secondary structure with multiple sequence alignment and full jack-knife procedure was 73.5%. The accuracy of the prediction increases to 74.2% by limiting the prediction to 375 (of 513) sequences having at least 50 PSI-BLAST alignments. The average accuracy of the prediction of the new improved program without using multiple sequence alignments was 67.5%. This is approximately a 3% improvement over the preceding GOR IV algorithm (Garnier J, Gibrat JF, Robson B. Methods Enzymol 1996;266:540-553; Kloczkowski A, Ting K-L, Jernigan RL, Garnier J. Polymer 2002;43:441-449). We have discussed alternatives to the segment overlap (Sov) coefficient proposed by Zemla et al. (Proteins 1999;34:220-223).  相似文献   

20.
A molecular theory of protein secondary structure is presented that takes account of both local interactions inside each chain region and long-range interactions between different regions, incorporating all these interactions in a single Ising-like model. Local interactions are evaluated from the stereochemical theory describing the relative stabilities of α- and β-structures for different residues in synthetic polypeptides, while long-range effects are approximated by the interaction of each chain region with the averaged hydrophobic template. Based on this theory, an algorithm of protein secondary structure prediction is proposed and examples are given of “blind” predictions made before the x-ray structural data became available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号