首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 359 毫秒
1.
Protein structure prediction   总被引:4,自引:0,他引:4  
J Garnier 《Biochimie》1990,72(8):513-524
Current methods developed for predicting protein structure are reviewed. The most widely used algorithms of Chou and Fasman and Garnier et al for predicting secondary structure are compared to the most recent ones including sequence similarity methods, neural network, pattern recognition or joint prediction methods. The best of these methods correctly predict 63-65% of the residues in the database with cross-validation for 3 conformations, helix, beta strand and coli with a standard deviation of 6-8% per protein. However, when a homologous protein is already in the database, the accuracy of prediction by the similarity peptide method of Levin and Garnier reaches about 90%. Some conclusions can be drawn on the mechanism of protein folding. As all the prediction methods only use the local sequence for prediction (+/- 8 residues maximum) one can infer that 65% of the conformation of a residue is dictated on average by the local sequence, the rest is brought by the folding. The best predicted proteins or peptide segments are those for which the folding has less effect on the conformation. Presently, prediction of tertiary structure is only of practical use when the structure of a homologous protein is already known. Amino acid alignment to define residues of equivalent spatial position is critical for modelling of the protein. We showed for serine proteases that secondary structure prediction can help to define a better alignment. Non-homologous segments of the polypeptide chain, such as loops, libraries of known loops and/or energy minimization with various force fields, are used without yet giving satisfactory solutions. An example of modelling by homology, aided by secondary structure prediction on 2 regulatory proteins, Fnr and FixK is presented.  相似文献   

2.
Prediction of transmembrane spans and secondary structure from the protein sequence is generally the first step in the structural characterization of (membrane) proteins. Preference of a stretch of amino acids in a protein to form secondary structure and being placed in the membrane are correlated. Nevertheless, current methods predict either secondary structure or individual transmembrane states. We introduce a method that simultaneously predicts the secondary structure and transmembrane spans from the protein sequence. This approach not only eliminates the necessity to create a consensus prediction from possibly contradicting outputs of several predictors but bears the potential to predict conformational switches, i.e., sequence regions that have a high probability to change for example from a coil conformation in solution to an α‐helical transmembrane state. An artificial neural network was trained on databases of 177 membrane proteins and 6048 soluble proteins. The output is a 3 × 3 dimensional probability matrix for each residue in the sequence that combines three secondary structure types (helix, strand, coil) and three environment types (membrane core, interface, solution). The prediction accuracies are 70.3% for nine possible states, 73.2% for three‐state secondary structure prediction, and 94.8% for three‐state transmembrane span prediction. These accuracies are comparable to state‐of‐the‐art predictors of secondary structure (e.g., Psipred) or transmembrane placement (e.g., OCTOPUS). The method is available as web server and for download at www.meilerlab.org . Proteins 2013; 81:1127–1140. © 2013 Wiley Periodicals, Inc.  相似文献   

3.
The ab initio folding problem can be divided into two sequential tasks of approximately equal computational complexity: the generation of native-like backbone folds and the positioning of side chains upon these backbones. The prediction of side-chain conformation in this context is challenging, because at best only the near-native global fold of the protein is known. To test the effect of displacements in the protein backbones on side-chain prediction for folds generated ab initio, sets of near-native backbones (≤ 4 Å Cα RMS error) for four small proteins were generated by two methods. The steric environment surrounding each residue was probed by placing the side chains in the native conformation on each of these decoys, followed by torsion-space optimization to remove steric clashes on a rigid backbone. We observe that on average 40% of the χ1 angles were displaced by 40° or more, effectively setting the limits in accuracy for side-chain modeling under these conditions. Three different algorithms were subsequently used for prediction of side-chain conformation. The average prediction accuracy for the three methods was remarkably similar: 49% to 51% of the χ1 angles were predicted correctly overall (33% to 36% of the χ1+2 angles). Interestingly, when the inter-side-chain interactions were disregarded, the mean accuracy increased. A consensus approach is described, in which side-chain conformations are defined based on the most frequently predicted χ angles for a given method upon each set of near-native backbones. We find that consensus modeling, which de facto includes backbone flexibility, improves side-chain prediction: χ1 accuracy improved to 51–54% (36–42% of χ1+2). Implications of a consensus method for ab initio protein structure prediction are discussed. Proteins 33:204–217, 1998. © 1998 Wiley-Liss, Inc.  相似文献   

4.
M J Rooman  S J Wodak 《Biochemistry》1992,31(42):10239-10249
It is investigated whether protein segments predicted to have a well-defined conformational preference in the absence of tertiary interactions are conserved in families of homologous proteins. The prediction method follows the procedures of Rooman, M., Kocher, J.-P., and Wodak, S. (preceding paper in this issue). It uses a knowledge-based force field that incorporates only local interactions along the sequence and identifies segments whose lowest energy structure displays a sizable energy gap relative to other computed conformations. In 13 of the protein families and subfamilies considered that are sufficiently homologous to have similar 3D structures, at least one region is consistently predicted as having the same preferred conformation in virtually all family members. These regions are between 4 and 26 residues long. They are often located at chain ends and correspond primarily to segments of secondary structure heavily involved in interactions with the rest of the protein, suggesting that they could act as nuclei around which other parts of the structure would assemble. Experimental data on early folding intermediates or on protein fragments with appreciable structure in aqueous solution are available for more than half of the protein families. Comparison of our results with these data is quite favorable. They reveal that each of the experimentally identified early formed, or independently stable, substructures harbors at least one of the segments consistently predicted as having a preferred conformation by our procedure. The implications of our findings for the conservation of folding pathways in homologous proteins are discussed.  相似文献   

5.
Kuznetsov IB 《Proteins》2008,72(1):74-87
Ordered conformational changes are an important structural property of proteins and are involved in a variety of fundamental biological activities. Large-scale analyses of the implications of such changes for protein function and dysfunction require efficient methods for automated recognition of conformationally variable residue positions. The goal of this work was to study sequence and low-resolution structural properties of residue positions that change backbone conformation upon changes in protein environment and the utility of these properties for automated recognition of such conformationally variable positions. This study was performed using a large nonredundant set of experimentally characterized proteins that undergo ordered conformational transitions obtained from the Database of Macromolecular Movements. The results of this study show that ordered changes in backbone conformation are not limited to solvent accessible loop regions. A considerable fraction of conformationally variable positions is observed in helices and strands, and in buried positions. Conformationally variable positions are less conserved in evolution. Local patterns of (a) sequence neighbors, (b) evolutionary conservation, and (c) solvent accessibility can be used to predict conformationally variable positions with balanced sensitivity and specificity, albeit with large variance at the level of individual proteins. However, including a pattern of secondary structure into the prediction scheme results in a highly unbalanced performance when all conformationally variable positions located in regular secondary structure are misclassified. Application of the present methodology to the prion protein (PrP) shows that conformationally variable positions predicted in its ordered C-terminal domain are located within segments presumed to be involved in refolding of PrP.  相似文献   

6.
A suite of FORTRAN programs, PREF, is described for calculating preference functions from the data base of known protein structures and for comparing smoothed profiles of sequence-dependent preferences in proteins of unknown structure. Amino acid preferences for a secondary structure are considered as functions of a sequence environment. Sequence environment of amino acid residue in a protein is defined as an average over some physical, chemical, or statistical property of its primary structure neighbors. The frequency distribution of sequence environments in the data base of soluble protein structures is approximately normal for each amino acid type of known secondary conformation. An analytical expression for the dependence of preferences on sequence environment is obtained after each frequency distribution is replaced by corresponding Gaussian function. The preference for the α-helical conformation increases for each amino acid type with the increase of sequence environment of buried solvent-accessible surface areas. We show that a set of preference functions based on buried surface area is useful for predicting folding motifs in α-class proteins and in integral membrane proteins. The prediction accuracy for helical residues is 79% for 5 integral membrane proteins and 74% for 11 α-class soluble proteins. Most residues found in transmembrane segments of membrane proteins with known α-helical structure are predicted to be indeed in the helical conformation because of very high middle helix preferences. Both extramembrane and transmembrane helices in the photosynthetic reaction center M and L subunits are correctly predicted. We point out in the discussion that our method of conformational preference functions can identify what physical properties of the amino acids are important in the formation of particular secondary structure elements. © 1993 John Wiley & Sons, Inc.  相似文献   

7.
Accurate identification of strand residues aids prediction and analysis of numerous structural and functional aspects of proteins. We propose a sequence-based predictor, BETArPRED, which improves prediction of strand residues and β-strand segments. BETArPRED uses a novel design that accepts strand residues predicted by SSpro and predicts the remaining positions utilizing a logistic regression classifier with nine custom-designed features. These are derived from the primary sequence, the secondary structure (SS) predicted by SSpro, PSIPRED and SPINE, and residue depth as predicted by RDpred. Our features utilize certain local (window-based) patterns in the predicted SS and combine information about the predicted SS and residue depth. BETArPRED is evaluated on 432 sequences that share low identity with the training chains, and on the CASP8 dataset. We compare BETArPRED with seven modern SS predictors, and the top-performing automated structure predictor in CASP8, the ZHANG-server. BETArPRED provides statistically significant improvements over each of the SS predictors; it improves prediction of strand residues and β-strands, and it finds β-strands that were missed by the other methods. When compared with the ZHANG-server, we improve predictions of strand segments and predict more actual strand residues, while the other predictor achieves higher rate of correct strand residue predictions when under-predicting them.  相似文献   

8.
The most stringent test for predictive methods of protein secondary structure is whether identical short sequences that are known to be present with different conformations in different proteins known at atomic resolution can be correctly discriminated. In this study, we show that the prediction efficiency of this type of segments in unrelated proteins reaches an average accuracy per residue ranging from about 72 to 75% (depending on the alignment method used to generate the input sequence profile) only when methods of the third generation are used. A comparison of different methods based on segment statistics (2nd generation methods) and/or including also evolutionary information (3rd generation methods) indicate that the discrimination of the different conformations of identical segments is dependent on the method used for the prediction. Accuracy is similar when methods similarly performing on the secondary structure prediction are tested. When evolutionary information is taken into account as compared to single sequence input, the number of correctly discriminated pairs is increased twofold. The results also highlight the predictive capability of neural networks for identical segments whose conformation differs in different proteins.  相似文献   

9.
For naturally occurring proteins, similar sequence implies similar structure. Consequently, multiple sequence alignments (MSAs) often are used in template‐based modeling of protein structure and have been incorporated into fragment‐based assembly methods. Our previous homology‐free structure prediction study introduced an algorithm that mimics the folding pathway by coupling the formation of secondary and tertiary structure. Moves in the Monte Carlo procedure involve only a change in a single pair of ?,ψ backbone dihedral angles that are obtained from a Protein Data Bank‐based distribution appropriate for each amino acid, conditional on the type and conformation of the flanking residues. We improve this method by using MSAs to enrich the sampling distribution, but in a manner that does not require structural knowledge of any protein sequence (i.e., not homologous fragment insertion). In combination with other tools, including clustering and refinement, the accuracies of the predicted secondary and tertiary structures are substantially improved and a global and position‐resolved measure of confidence is introduced for the accuracy of the predictions. Performance of the method in the Critical Assessment of Structure Prediction (CASP8) is discussed.  相似文献   

10.
Nanda V  DeGrado WF 《Proteins》2005,59(3):454-466
In the absence of experimental structural determination, numerous methods are available to indirectly predict or probe the structure of a target molecule. Genetic modification of a protein sequence is a powerful tool for identifying key residues involved in binding reactions or protein stability. Mutagenesis data is usually incorporated into the modeling process either through manual inspection of model compatibility with empirical data, or through the generation of geometric constraints linking sensitive residues to a binding interface. We present an approach derived from statistical studies of lattice models for introducing mutation information directly into the fitness score. The approach takes into account the phenotype of mutation (neutral or disruptive) and calculates the energy for a given structure over an ensemble of sequences. The structure prediction procedure searches for the optimal conformation where neutral sequences either have no impact or improve stability and disruptive sequences reduce stability relative to wild type. We examine three types of sequence ensembles: information from saturation mutagenesis, scanning mutagenesis, and homologous proteins. Incorporating multiple sequences into a statistical ensemble serves to energetically separate the native state and misfolded structures. As a result, the prediction of structure with a poor force field is sufficiently enhanced by mutational information to improve accuracy. Furthermore, by separating misfolded conformations from the target score, the ensemble energy serves to speed up conformational search algorithms such as Monte Carlo-based methods.  相似文献   

11.
  • 1.(1) Co-operation between a laboratory interested in developing the theory for protein secondary structure prediction methods and a laboratory interested in applying and comparing such methods has led to the development of a simple predictive algorithm.
  • 2.(2) Four-state predictions, in which each residue is unambiguously assigned one conformational state of α-helix, extended chain, reverse turn or coil, predict 49% of residue states correctly (in a sample of 26 proteins) when the overall helix and extended-chain content is not taken into account.
  • 3.(3) When the relative abundances of helix, extended chain, reverse turn and coil observed by X-ray crystallography are taken into account, a single constant for each protein and type of conformation can be used to bias the prediction. When predictions are optimized in this way, 63% of all residue states are unambiguously and correctly assigned.
  • 4.(4) By analysing the nature of the bias required, proteins can be classified into helix-rich types, pleated-sheet-rich types, and so on. It is shown that, if the type of protein can be determined even approximately by circular dichroism, 57% of residue states can be correctly predicted without taking into account the X-ray structure. Further, comparable predictions can be obtained if, instead of circular dichroism, preliminary predictions are made to assess the protein type.
  • 5.(5) It is emphasized that the numbers quoted here depend on the method used to assess accuracy, and the algorithm is shown to be at least as good as, and usually superior to, the reported prediction methods assessed in the same way.
  • 6.(6) Ways of further enhancing predictions by the use of additional information from hydrophobic triplets and homologous sequences are also explored. Hydro-phobic triplet information does not significantly improve predictive power and it is concluded that this information is used by proteins in the next stage of folding. On the other hand, the use of homologous sequences appears to be very promising.
  • 7.(7) The implication of these results in protein folding is discussed.
  相似文献   

12.
We have re-evaluated the information used in the Garnier-Osguthorpe-Robson (GOR) method of secondary structure prediction with the currently available database. The framework of information theory provides a means to formulate the influence of local sequence upon the conformation of a given residue, in a rigorous manner. However, the existing database does not allow the evaluation of parameters required for an exact treatment of the problem. The validity of the approximations drawn from the theory is examined. It is shown that the first-level approximation, involving single-residue parameters, is only marginally improved by an increase in the database. The second-level approximation, involving pairs of residues, provides a better model. However, in this case the database is not big enough and this method might lead to parameters with deficiencies. Attention is therefore given to overcoming this lack of data. We have determined the significant pairs and the number of dummy observations necessary to obtain the best result for the prediction. This new version of the GOR method increases the accuracy of prediction by 7%, bringing the amount of residues correctly predicted to 63% for three states and 68 proteins, each protein to be predicted being removed from the database and the parameters derived from the other proteins. If the protein to be predicted is kept in the database the accuracy goes up to 69.7%.  相似文献   

13.
Prediction of protein structure from its amino acid sequence is still a challenging problem. The complete physicochemical understanding of protein folding is essential for the accurate structure prediction. Knowledge of residue solvent accessibility gives useful insights into protein structure prediction and function prediction. In this work, we propose a random forest method, RSARF, to predict residue accessible surface area from protein sequence information. The training and testing was performed using 120 proteins containing 22006 residues. For each residue, buried and exposed state was computed using five thresholds (0%, 5%, 10%, 25%, and 50%). The prediction accuracy for 0%, 5%, 10%, 25%, and 50% thresholds are 72.9%, 78.25%, 78.12%, 77.57% and 72.07% respectively. Further, comparison of RSARF with other methods using a benchmark dataset containing 20 proteins shows that our approach is useful for prediction of residue solvent accessibility from protein sequence without using structural information. The RSARF program, datasets and supplementary data are available at http://caps.ncbs.res.in/download/pugal/RSARF/.  相似文献   

14.
To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.  相似文献   

15.
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.  相似文献   

16.
The primary and secondary structure of human plasma apolipoprotein A-I and apolipoprotein E-3 have been analyzed to further our understanding of the secondary and tertiary conformation of these proteins and the structure and function of plasma lipoprotein particles. The methods used to analyze the primary sequence of these proteins used computer programs: (a) to identify repeated patterns within these proteins on the basis of conservative substitutions and similarities within the physicochemical properties of each residue; (b) for local averaging, hydrophobic moment, and Fourier analysis of the physicochemical properties; and (c) for secondary structure prediction of each protein carried out using homology, statistical, and information theory based methods. Circular dichroism was used to study purified lipid-protein complexes of each protein and quantitate the secondary structure in a lipid environment. The data from these analyses were integrated into a single secondary structure prediction to derive a model of each protein. The sequence homology within apolipoproteins A-I, E-3, and A-IV is used to derive a consensus sequence for two 11 amino acid repeating sequences in this family of proteins.  相似文献   

17.
Current methods of prediction of protein conformation are reviewedand the algorithms on which they rely are presented. For non-homologousproteins and after cross-validation the reported methods exhibita probability index, i.e. the per cent of correctly predictedresidues per predicted residues, of 63–65% with a standarddeviation of the order of 7% for three conformational states—helix,ß-strand and coil. This present limitation in theaccuracy of predictions that use only the information of thelocal sequence can be related essentially to the effect of long-rangeinteractions specific for each protein family. The methods basedon sequence similarity can improve the accuracy of predictionby expressing explicitly the homology of the protein to be predictedwith proteins in the database. In these circumstances the probabilityindex can reach 87% with a standard deviation of 6.6%. Thisproperty can be used for modeling homologous proteins by aidingin amino acid sequence alignments. The prediction of the tertiarystructure of a protein is still limited to the case of modelinga structure based on the known three-dimensional structure ofa homologous protein.  相似文献   

18.
Review: protein secondary structure prediction continues to rise   总被引:15,自引:0,他引:15  
Methods predicting protein secondary structure improved substantially in the 1990s through the use of evolutionary information taken from the divergence of proteins in the same structural family. Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than four percentage points to its current height of around 76% of all residues predicted correctly in one of the three states, helix, strand, and other. The past year also brought successful new concepts to the field. These new methods may be particularly interesting in light of the improvements achieved through simple combining of existing methods. Divergent evolutionary profiles contain enough information not only to substantially improve prediction accuracy, but also to correctly predict long stretches of identical residues observed in alternative secondary structure states depending on nonlocal conditions. An example is a method automatically identifying structural switches and thus finding a remarkable connection between predicted secondary structure and aspects of function. Secondary structure predictions are increasingly becoming the work horse for numerous methods aimed at predicting protein structure and function. Is the recent increase in accuracy significant enough to make predictions even more useful? Because the recent improvement yields a better prediction of segments, and in particular of beta strands, I believe the answer is affirmative. What is the limit of prediction accuracy? We shall see.  相似文献   

19.
The retrovirus integrase (IN) protein is essential for integration of viral DNA into host DNA. The secondary structure of the purified IN protein from avian myeloblastosis virus was investigated by both circular dichroism (CD) spectroscopy and five empirical prediction methods. The secondary structures determined from the resolving of CD spectra through a least-squares curve fitting procedure were compared with those predicted from four statistical methods, e.g., the Chou-Fasman, Garnier-Osguthorpe-Robson, Nishikawa-Ooi, and a JOINT scheme which combined all three of these methods, plus a pure a priori one, the Ptitsyn-Finkelstein method. Among all of the methods used, the Nishikawa-Ooi prediction gave the closest match in the composition of secondary structure to the CD result, although the other methods each correctly predicted one or more secondary structural group. Most of the alpha-helix and beta-sheet states predicted by the Ptitsyn-Finkelstein method were in accord with the Nishikawa-Ooi method. Secondary structural predictions by the Nishikawa-Ooi method were extended further to include IN proteins from four phylogenetic distinct retroviruses. The structural relationships between the four most conserved amino acid blocks of these IN proteins were compared using sequence homology and secondary structure predictions.  相似文献   

20.
A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three-state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as: residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto-correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto-correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of > 80%. Existing high-accuracy prediction methods are "black-box" predictors based on complex nonlinear statistics (e.g., neural networks in PHD: Rost & Sander, 1993a). For medium- to short-length chains (> or = 90 residues and < 170 residues), the prediction method is significantly more accurate (P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three-state accuracy of 72.4%, the highest accuracy reported for any prediction method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号