首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
Sadeghi M  Parto S  Arab S  Ranjbar B 《FEBS letters》2005,579(16):3397-3400
We have used a statistical approach for protein secondary structure prediction based on information theory and simultaneously taking into consideration pairwise residue types and conformational states. Since the prediction of residue secondary structure by one residue window sliding make ambiguity in state prediction, we used a dynamic programming algorithm to find the path with maximum score. A score system for residue pairs in particular conformations is derived for adjacent neighbors up to ten residue apart in sequence. The three state overall per-residue accuracy, Q3, of this method in a jackknife test with dataset created from PDBSELECT is more than 70%.  相似文献   

2.
The classical problem of secondary structure prediction is approached by a new joint algorithm (Q7-JASEP) that combines the best aspects of six different methods. The algorithm includes the statistical methods of Chou-Fasman, Nagano, and Burgess-Ponnuswamy-Scheraga, the homology method of Nishikawa, the information theory method of Garnier-Osgurthope-Robson, and the artificial neural network approach of Qian-Sejnowski. Steps in the algorithm are (i) optimizing each individual method with respect to its correlation coefficient (Q7) for assigning a structural type from the predictive score of the method, (ii) weighting each method, (iii) combining the scores from different methods, and (iv) comparing the scores for alpha-helix, beta-strand, and coil conformational states to assign the secondary structure at each residue position. The present application to 45 globular proteins demonstrates good predictive power in cross-validation testing (with average correlation coefficients per test protein of Q7, alpha = 0.41, Q7, beta = 0.47, Q7,c = 0.41 for alpha-helix, beta-strand, and coil conformations). By the criterion of correlation coefficient (Q7) for each type of secondary structure, Q7-JASEP performs better than any of the component methods. When all protein classes are included for training and testing (by cross-validation), the results here equal the best in the literature, by the Q7 criterion. More generally, the basic algorithm can be applied to any protein class and to any type of structure/sequence or function/sequence correlation for which multiple predictive methods exist.  相似文献   

3.
The DSSP program automatically assigns the secondary structure for each residue from the three-dimensional co-ordinates of a protein structure to one of eight states. However, discrete assignments are incomplete in that they cannot capture the continuum of thermal fluctuations. Therefore, DSSPcont (http://cubic.bioc.columbia.edu/services/DSSPcont) introduces a continuous assignment of secondary structure that replaces 'static' by 'dynamic' states. Technically, the continuum results from calculating weighted averages over 10 discrete DSSP assignments with different hydrogen bond thresholds. A DSSPcont assignment for a particular residue is a percentage likelihood of eight secondary structure states, derived from a weighted average of the ten DSSP assignments. The continuous assignments have two important features: (i) they reflect the structural variations due to thermal fluctuations as detected by NMR spectroscopy; and (ii) they reproduce the structural variation between many NMR models from one single model. Therefore, functionally important variation can be extracted from a single X-ray structure using the continuous assignment procedure.  相似文献   

4.
The DSSP program assigns protein secondary structure to one of eight states. This discrete assignment cannot describe the continuum of thermal fluctuations. Hence, a continuous assignment is proposed. Technically, the continuum results from averaging over ten discrete DSSP assignments with different hydrogen bond thresholds. The final continuous assignment for a single NMR model successfully reflected the structural variations observed between all NMR models in the ensemble. The structural variations between NMR models were verified to correlate with thermal motion; these variations were captured by the continuous assignments. Because the continuous assignment reproduces the structural variation between many NMR models from one single model, functionally important variation can be extracted from a single X-ray structure. Thus, continuous assignments of secondary structure may affect future protein structure analysis, comparison, and prediction.  相似文献   

5.
We provide a unified overview of methods that currently are widely used to assess the accuracy of prediction algorithms, from raw percentages, quadratic error measures and other distances, and correlation coefficients, and to information theoretic measures such as relative entropy and mutual information. We briefly discuss the advantages and disadvantages of each approach. For classification tasks, we derive new learning algorithms for the design of prediction systems by directly optimising the correlation coefficient. We observe and prove several results relating sensitivity and specificity of optimal systems. While the principles are general, we illustrate the applicability on specific problems such as protein secondary structure and signal peptide prediction.  相似文献   

6.
Swanson R  Vannucci M  Tsai JW 《Proteins》2009,74(3):701-711
Protein structure prediction has a number of important ad hoc similarity measures for evaluating predictions, but would benefit from a measure that is able to provide a common framework for a broad range of comparisons. Here we show that a mutual information-like measure can provide a comprehensive framework for evaluating protein structure prediction of all types. We discuss the concept of information, its application to secondary structure, and the obstacle to applying it to 3D structure. On the basis of the insights from the secondary structure case, we present an approach to work around the 3D difficulties, and develop a method to measure the mutual information provided by a 3D structure prediction. We integrate the evaluation of all types of protein structure prediction into a single framework, and compare the amount of information provided by various prediction methods, including secondary structure prediction. Within this broadened framework, the idea that structure is better preserved than sequence during evolution is evaluated quantitatively for the globin family. A nearly perfect sequence match in the globin family corresponds to about 300 bits of information, whereas a nearly perfect structural match for the same two proteins corresponds to about 2500 bits of information, where bits of information describes the probability of obtaining a match of similar closeness by chance. Mutual information provides both a theoretical basis for evaluating structure similarity and an explanatory surround for existing similarity measures.  相似文献   

7.
We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.  相似文献   

8.
Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template‐defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile‐based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa . Proteins 2015; 83:411–427. © 2014 Wiley Periodicals, Inc.  相似文献   

9.
MOTIVATION: Transmembrane beta-barrel (TMB) proteins are embedded in the outer membranes of mitochondria, Gram-negative bacteria and chloroplasts. These proteins perform critical functions, including active ion-transport and passive nutrient intake. Therefore, there is a need for accurate prediction of secondary and tertiary structure of TMB proteins. Traditional homology modeling methods, however, fail on most TMB proteins since very few non-homologous TMB structures have been determined. Yet, because TMB structures conform to specific construction rules that restrict the conformational space drastically, it should be possible for methods that do not depend on target-template homology to be applied successfully. RESULTS: We develop a suite (TMBpro) of specialized predictors for predicting secondary structure (TMBpro-SS), beta-contacts (TMBpro-CON) and tertiary structure (TMBpro-3D) of transmembrane beta-barrel proteins. We compare our results to the recent state-of-the-art predictors transFold and PRED-TMBB using their respective benchmark datasets, and leave-one-out cross-validation. Using the transFold dataset TMBpro predicts secondary structure with per-residue accuracy (Q(2)) of 77.8%, a correlation coefficient of 0.54, and TMBpro predicts beta-contacts with precision of 0.65 and recall of 0.67. Using the PRED-TMBB dataset, TMBpro predicts secondary structure with Q(2) of 88.3% and a correlation coefficient of 0.75. All of these performance results exceed previously published results by 4% or more. Working with the PRED-TMBB dataset, TMBpro predicts the tertiary structure of transmembrane segments with RMSD <6.0 A for 9 of 14 proteins. For 6 of 14 predictions, the RMSD is <5.0 A, with a GDT_TS score greater than 60.0. AVAILABILITY: http://www.igb.uci.edu/servers/psss.html.  相似文献   

10.
MOTIVATION: Protein secondary structure prediction is an important step towards understanding how proteins fold in three dimensions. Recent analysis by information theory indicates that the correlation between neighboring secondary structures are much stronger than that of neighboring amino acids. In this article, we focus on the combination problem for sequences, i.e. combining the scores or assignments from single or multiple prediction systems under the constraint of a whole sequence, as a target for improvement in protein secondary structure prediction. RESULTS: We apply several graphical chain models to solve the combination problem and show that they are consistently more effective than the traditional window-based methods. In particular, conditional random fields (CRFs) moderately improve the predictions for helices and, more importantly, for beta sheets, which are the major bottleneck for protein secondary structure prediction.  相似文献   

11.
The assumption that homologous segments in different proteins may share a similar conformation is applied to the prediction of secondary structures in proteins. Sequences homologous to a target protein are searched, without allowing any gap, and compared against a number of reference proteins of known three-dimensional structure, and then a conformational state (alpha, beta or coil) for each residue of the protein is predicted by looking at the secondary structure of corresponding homologous segments. This prediction is done in a statistical rather than 'deterministic' way, by assigning the most probable conformation state among homologous data to each residue site of a target protein. A test application for 22 sample proteins yields 60% correctness on the average, a better value in comparison with two other existing methods. Joint prediction combining three methods into one is shown to increase the reliability up to 70%, when only the regions identically predicted with the three methods are taken into account. Application of the present method to 10 proteins of unknown structure is demonstrated.  相似文献   

12.
Protein secondary structure (PSS) prediction is an important topic in bioinformatics. Our study on a large set of non-homologous proteins shows that long-range interactions commonly exist and negatively affect PSS prediction. Besides, we also reveal strong correlations between secondary structure (SS) elements. In order to take into account the long-range interactions and SS-SS correlations, we propose a novel prediction system based on cascaded bidirectional recurrent neural network (BRNN). We compare the cascaded BRNN against another two BRNN architectures, namely the original BRNN architecture used for speech recognition as well as Pollastri's BRNN that was proposed for PSS prediction. Our cascaded BRNN achieves an overall three state accuracy Q3 of 74.38\%, and reaches a high Segment OVerlap (SOV) of 66.0455. It outperforms the original BRNN and Pollastri's BRNN in both Q3 and SOV. Specifically, it improves the SOV score by 4-6%.  相似文献   

13.
Nucleic acids are elucidated in configuration space. An algorithm relating sequence to stability in A and B helical secondary structures, is stated to incorporate NMR conformational and optical melting data. This made possible a classification of elementary sequences in terms of configuration forces driving between A and B states, a finding useful in prediction of structural behavior of different sequences of DNA, RNA and their hybrids.  相似文献   

14.
Nucleic acids are elucidated in configuration space. An algorithm relating sequence to stability in A and B helical secondary structures, is stated to incorporate NMR conformational and optical melting data. This made possible a classification of elementary sequences in terms of configuration forces driving between A and B states, a finding useful in prediction of structural behavior of different sequences of DNA, RNA and their hybrids.  相似文献   

15.
Chemical shift frequencies represent a time-average of all the conformational states populated by a protein. Thus, chemical shift prediction programs based on sequence and database analysis yield higher accuracy for rigid rather than flexible protein segments. Here we show that the prediction accuracy can be significantly improved by averaging over an ensemble of structures, predicted solely from amino acid sequence with the Rosetta program. This approach to chemical shift and structure prediction has the potential to be useful for guiding resonance assignments, especially in solid-state NMR structural studies of membrane proteins in proteoliposomes.  相似文献   

16.
Prediction of transmembrane spans and secondary structure from the protein sequence is generally the first step in the structural characterization of (membrane) proteins. Preference of a stretch of amino acids in a protein to form secondary structure and being placed in the membrane are correlated. Nevertheless, current methods predict either secondary structure or individual transmembrane states. We introduce a method that simultaneously predicts the secondary structure and transmembrane spans from the protein sequence. This approach not only eliminates the necessity to create a consensus prediction from possibly contradicting outputs of several predictors but bears the potential to predict conformational switches, i.e., sequence regions that have a high probability to change for example from a coil conformation in solution to an α‐helical transmembrane state. An artificial neural network was trained on databases of 177 membrane proteins and 6048 soluble proteins. The output is a 3 × 3 dimensional probability matrix for each residue in the sequence that combines three secondary structure types (helix, strand, coil) and three environment types (membrane core, interface, solution). The prediction accuracies are 70.3% for nine possible states, 73.2% for three‐state secondary structure prediction, and 94.8% for three‐state transmembrane span prediction. These accuracies are comparable to state‐of‐the‐art predictors of secondary structure (e.g., Psipred) or transmembrane placement (e.g., OCTOPUS). The method is available as web server and for download at www.meilerlab.org . Proteins 2013; 81:1127–1140. © 2013 Wiley Periodicals, Inc.  相似文献   

17.
A major challenge in the development of antibody biotherapeutics is their tendency to aggregate. One root cause for aggregation is exposure of hydrophobic surface regions to the solvent. Many current techniques predict the relative aggregation propensity of antibodies via precalculated scales for the hydrophobicity or aggregation propensity of single amino acids. However, those scales cannot describe the nonadditive effects of a residue’s surrounding on its hydrophobicity. Therefore, they are inherently limited in their ability to describe the impact of subtle differences in molecular structure on the overall hydrophobicity. Here, we introduce a physics-based approach to describe hydrophobicity in terms of the hydration free energy using grid inhomogeneous solvation theory (GIST). We apply this method to assess the effects of starting structures, conformational sampling, and protonation states on the hydrophobicity of antibodies. Our results reveal that high-quality starting structures, i.e., crystal structures, are crucial for the prediction of hydrophobicity and that conformational sampling can compensate errors introduced by the starting structure. On the other hand, sampling of protonation states only leads to good results when combined with high-quality structures, whereas it can even be detrimental otherwise. We conclude by pointing out that a single static homology model may not be adequate for predicting hydrophobicity.  相似文献   

18.
Qin S  He Y  Pan XM 《Proteins》2005,61(3):473-480
We have improved the multiple linear regression (MLR) algorithm for protein secondary structure prediction by combining it with the evolutionary information provided by multiple sequence alignment of PSI-BLAST. On the CB513 dataset, the three states average overall per-residue accuracy, Q(3), reached 76.4%, while segment overlap accuracy, SOV99, reached 73.2%, using a rigorous jackknife procedure and the strictest reduction of eight states DSSP definition to three states. This represents an improvement of approximately 5% on overall per-residue accuracy compared with previous work. The relative solvent accessibility prediction also benefited from this combination of methods. The system achieved 77.7% average jackknifed accuracy for two states prediction based on a 25% relative solvent accessibility mode, with a Mathews' correlation coefficient of 0.548. The improved MLR secondary structure and relative solvent accessibility prediction server is available at http://spg.biosci.tsinghua.edu.cn/.  相似文献   

19.
A study is presented of the conformational characteristics of NMR-derived protein structures in the Protein Data Bank compared to X-ray structures. Both ensemble and energy-minimized average structures are analyzed. We have addressed the problem using the methods developed for crystal structures by examining the distribution of ?, Ψ, and χ angles as indicators of global conformational irregularity. All these features in NMR structures occur to varying degrees in multiple conformational states. Some measures of local geometry are very tightly constrained by the methods used to generate the structure, e.g., proline ? angles, α-helix ?, Ψ angles, ω angles, and Cα chirality. The more lightly restrained torsion angles do show increasead clustering as the number of overall experimental observations increases. ?, Ψ, and χ1 angle conformational heterogeneity is strongly correlated with accessibility but shows additional differences which reflect the differing number of observations possible in NMR for the various side chains (e.g., many for Trp, few for Ser). In general, we find that the core is defined to a notional resolution of 2.0 to 2.3 Å. Of real interest is the behavior of surface residues and in particular the side chains where multiple rotameric states in different structures can vary from 10% to 88%. Later generation structures show a much tighter definition which correlates with increasing use of J-coupling information, stereospecific assignments, and heteronumclear techniques. A suite of programs is being developed to address the special needs of NMR-derived structures which will take into account the existence of increased mobility in solution. © 1993 Wiley-Liss, Inc.  相似文献   

20.
Kinjo AR  Horimoto K  Nishikawa K 《Proteins》2005,58(1):158-165
The contact number of an amino acid residue in a protein structure is defined by the number of C(beta) atoms around the C(beta) atom of the given residue, a quantity similar to, but different from, solvent accessible surface area. We present a method to predict the contact numbers of a protein from its amino acid sequence. The method is based on a simple linear regression scheme and predicts the absolute values of contact numbers. When single sequences are used for both parameter estimation and cross-validation, the present method predicts the contact numbers with a correlation coefficient of 0.555 on average. When multiple sequence alignments are used, the correlation increases to 0.627, which is a significant improvement over previous methods. In terms of discrete states prediction, the accuracies for 2-, 3-, and 10-state predictions are, respectively, 71.4%, 54.1%, and 18.9% with residue type-dependent unbiased thresholds, and 76.3%, 59.2%, and 21.8% with residue type-independent unbiased thresholds. The difference between accessible surface area and contact number from a prediction viewpoint and the application of contact number prediction to three-dimensional structure prediction are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号