首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Gupta N  Mangal N  Biswas S 《Proteins》2005,59(2):196-204
Prediction of fold from amino acid sequence of a protein has been an active area of research in the past few years, but the limited accuracy of existing techniques emphasizes the need to develop newer approaches to tackle this task. In this study, we use contact map prediction as an intermediate step in fold prediction from sequence. Contact map is a reduced graph-theoretic representation of proteins that models the local and global inter-residue contacts in the structure. We start with a population of random contact maps for the protein sequence and "evolve" the population to a "high-feasibility" configuration using a genetic algorithm. A neural network is employed to assess the feasibility of contact maps based on their 4 physically relevant properties. We also introduce 5 parameters, based on algebraic graph theory and physical considerations, that can be used to judge the structural similarity between proteins through contact maps. To predict the fold of a given amino acid sequence, we predict a contact map that will sufficiently approximate the structure of the corresponding protein. Then we assess the similarity of this contact map with the representative contact map of each fold; the fold that corresponds to the closest match is our predicted fold for the input sequence. We have found that our feasibility measure is able to differentiate between feasible and infeasible contact maps. Further, this novel approach is able to predict the folds from sequences significantly better than a random predictor.  相似文献   

2.
We examine how effectively simple potential functions previously developed can identify compatibilities between sequences and structures of proteins for database searches. The potential function consists of pairwise contact energies, repulsive packing potentials of residues for overly dense arrangement and short-range potentials for secondary structures, all of which were estimated from statistical preferences observed in known protein structures. Each potential energy term was modified to represent compatibilities between sequences and structures for globular proteins. Pairwise contact interactions in a sequence-structure alignment are evaluated in a mean field approximation on the basis of probabilities of site pairs to be aligned. Gap penalties are assumed to be proportional to the number of contacts at each residue position, and as a result gaps will be more frequently placed on protein surfaces than in cores. In addition to minimum energy alignments, we use probability alignments made by successively aligning site pairs in order by pairwise alignment probabilities. The results show that the present energy function and alignment method can detect well both folds compatible with a given sequence and, inversely, sequences compatible with a given fold, and yield mostly similar alignments for these two types of sequence and structure pairs. Probability alignments consisting of most reliable site pairs only can yield extremely small root mean square deviations, and including less reliable pairs increases the deviations. Also, it is observed that secondary structure potentials are usefully complementary to yield improved alignments with this method. Remarkably, by this method some individual sequence-structure pairs are detected having only 5-20% sequence identity.  相似文献   

3.
S Miyazawa  R L Jernigan 《Proteins》1999,36(3):357-369
We consider modifications of an empirical energy potential for fold and sequence recognition to represent approximately the stabilities of proteins in various environments. A potential used here includes a secondary structure potential representing short-range interactions for secondary structures of proteins, and a tertiary structure potential consisting of a long-range, pairwise contact potential and a repulsive packing potential. This potential is devised to evaluate together the total conformational energy of a protein at the coarse grained residue level. It was previously estimated from the observed frequencies of secondary structures, from contact frequencies between residues, and from the distributions of the number of residues in contact in known protein structures by regarding those distributions as the equilibrium distributions with the Boltzmann factor of these interaction energies. The stability of native structures is assumed as a primary requirement for proteins to fold into their native structures. A collapse energy is subtracted from the contact energies to remove the protein size dependence and to represent protein stabilities for monomeric and multimeric states. The free energy of the whole ensemble of protein conformations that is subtracted from the conformational energy to represent protein stability is approximated as the average energy expected for a typical native structure with the same amino acid composition. This term may be constant in fold recognition but essentially varies in sequence recognition. A simple test of threading sequences into structures without gaps is employed to demonstrate the importance of the present modifications that permit the same potential to be utilized for both fold and sequence recognition. Proteins 1999;36:357-369. Published 1999 Wiley-Liss, Inc.  相似文献   

4.
Template-based modeling is considered as one of the most successful approaches for protein structure prediction. However, reliably and accurately selecting optimal template proteins from a library of known protein structures having similar folds as the target protein and making correct alignments between the target sequence and the template structures, a template-based modeling technique known as threading, remains challenging, particularly for non- or distantly-homologous protein targets. With the recent advancement in protein residue-residue contact map prediction powered by sequence co-evolution and machine learning, here we systematically analyze the effect of inclusion of residue-residue contact information in improving the accuracy and reliability of protein threading. We develop a new threading algorithm by incorporating various sequential and structural features, and subsequently integrate residue-residue contact information as an additional scoring term for threading template selection. We show that the inclusion of contact information attains statistically significantly better threading performance compared to a baseline threading algorithm that does not utilize contact information when everything else remains the same. Experimental results demonstrate that our contact based threading approach outperforms popular threading method MUSTER, contact-assisted ab initio folding method CONFOLD2, and recent state-of-the-art contact-assisted protein threading methods EigenTHREADER and map_align on several benchmarks. Our study illustrates that the inclusion of contact maps is a promising avenue in protein threading to ultimately help to improve the accuracy of protein structure prediction.  相似文献   

5.
We develop a simple model for computing the rates and routes of folding of two-state proteins from the contact maps of their native structures. The model is based on the graph-theoretical concept of effective contact order (ECO). The model predicts that proteins fold by "zipping up" in a sequence of small-loop-closure events, depending on the native chain fold. Using a simple equation, with a few physical rate parameters, we obtain a good correlation with the folding rates of 24 two-state folding proteins. The model rationalizes data from Phi-value analysis that have been interpreted in terms of delocalized or polarized transition states. This model indicates how much of protein folding may take place in parallel, not along a single reaction coordinate or with a single transition state.  相似文献   

6.
We have developed a fully automated protein design strategy that works on the entire sequence of the protein and uses a full atom representation. At each step of the procedure, an all-atom model of the protein is built using the template protein structure and the current designed sequence. The energy of the model is used to drive a Monte Carlo optimization in sequence space: random moves are either accepted or rejected based on the Metropolis criterion. We rely on the physical forces that stabilize native protein structures to choose the optimum sequence. Our energy function includes van der Waals interactions, electrostatics and an environment free energy. Successful protein design should be specific and generate a sequence compatible with the template fold and incompatible with competing folds. We impose specificity by maintaining the amino acid composition constant, based on the random energy model. The specificity of the optimized sequence is tested by fold recognition techniques. Successful sequence designs for the B1 domain of protein G, for the lambda repressor and for sperm whale myoglobin are presented. We show that each additional term of the energy function improves the performance of our design procedure: the van der Waals term ensures correct packing, the electrostatics term increases the specificity for the correct native fold, and the environment solvation term ensures a correct pattern of buried hydrophobic and exposed hydrophilic residues. For the globin family, we show that we can design a protein sequence that is stable in the myoglobin fold, yet incompatible with the very similar hemoglobin fold.  相似文献   

7.
We present an analysis of the protein fold recognition experiment using PROSPECT in The Third Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP3). PROSPECT is a computer program we have recently developed for finding an optimal alignment between a protein sequence and a protein structural fold. Two unique features of PROSPECT are (a) that it guarantees to find the globally optimal sequence-structure alignment and does so in an efficient manner, when the alignment-scoring function consists of three additive terms: (i) a singleton fitness term, (ii) a pairwise contact preference term between residues that are spatially close (相似文献   

8.
Zhou H  Zhou Y 《Proteins》2004,55(4):1005-1013
An elaborate knowledge-based energy function is designed for fold recognition. It is a residue-level single-body potential so that highly efficient dynamic programming method can be used for alignment optimization. It contains a backbone torsion term, a buried surface term, and a contact-energy term. The energy score combined with sequence profile and secondary structure information leads to an algorithm called SPARKS (Sequence, secondary structure Profiles and Residue-level Knowledge-based energy Score) for fold recognition. Compared with the popular PSI-BLAST, SPARKS is 21% more accurate in sequence-sequence alignment in ProSup benchmark and 10%, 25%, and 20% more sensitive in detecting the family, superfamily, fold similarities in the Lindahl benchmark, respectively. Moreover, it is one of the best methods for sensitivity (the number of correctly recognized proteins), alignment accuracy (based on the MaxSub score), and specificity (the average number of correctly recognized proteins whose scores are higher than the first false positives) in LiveBench 7 among more than twenty servers of non-consensus methods. The simple algorithm used in SPARKS has the potential for further improvement. This highly efficient method can be used for fold recognition on genomic scales. A web server is established for academic users on http://theory.med.buffalo.edu.  相似文献   

9.
Using a benchmark set of structurally similar proteins, we conduct a series of threading experiments intended to identify a scoring function with an optimal combination of contact-potential and sequence-profile terms. The benchmark set is selected to include many medium-difficulty fold recognition targets, where sequence similarity is undetectable by BLAST but structural similarity is extensive. The contact potential is based on the log-odds of non-local contacts involving different amino acid pairs, in native as opposed to randomly compacted structures. The sequence profile term is that used in PSI-BLAST. We find that combination of these terms significantly improves the success rate of fold recognition over use of either term alone, with respect to both recognition sensitivity and the accuracy of threading models. Improvement is greatest for targets between 10 % and 20 % sequence identity and 60 % to 80 % superimposable residues, where the number of models crossing critical accuracy and significance thresholds more than doubles. We suggest that these improvements account for the successful performance of the combined scoring function at CASP3. We discuss possible explanations as to why sequence-profile and contact-potential terms appear complementary.  相似文献   

10.
To adopt a particular fold, a protein requires several interactions between its amino acid residues. The energetic contribution of these residue–residue interactions can be approximated by extracting statistical potentials from known high resolution structures. Several methods based on statistical potentials extracted from unrelated proteins are found to make a better prediction of probability of point mutations. We postulate that the statistical potentials extracted from known structures of similar folds with varying sequence identity can be a powerful tool to examine probability of point mutation. By keeping this in mind, we have derived pairwise residue and atomic contact energy potentials for the different functional families that adopt the (α/β)8 TIM‐Barrel fold. We carried out computational point mutations at various conserved residue positions in yeast Triose phosphate isomerase enzyme for which experimental results are already reported. We have also performed molecular dynamics simulations on a subset of point mutants to make a comparative study. The difference in pairwise residue and atomic contact energy of wildtype and various point mutations reveals probability of mutations at a particular position. Interestingly, we found that our computational prediction agrees with the experimental studies of Silverman et al. (Proc Natl Acad Sci 2001;98:3092–3097) and perform better prediction than iMutant and Cologne University Protein Stability Analysis Tool. The present work thus suggests deriving pairwise contact energy potentials and molecular dynamics simulations of functionally important folds could help us to predict probability of point mutations which may ultimately reduce the time and cost of mutation experiments. Proteins 2016; 85:54–64. © 2016 Wiley Periodicals, Inc.  相似文献   

11.
Bastolla U  Bruscolini P  Velasco JL 《Proteins》2012,80(9):2287-2304
In comparison with intense investigation of the structural determinants of protein folding rates, the sequence features favoring fast folding have received little attention. Here, we investigate this subject using simple models of protein folding and a statistical analysis of the Protein Data Bank (PDB). The mean-field model by Plotkin and coworkers predicts that the folding rate is accelerated by stronger-than-average interactions at short distance along the sequence. We confirmed this prediction using the Finkelstein model of protein folding, which accounts for realistic features of polymer entropy. We then tested this prediction on the PDB. We found that native interactions are strongest at contact range l = 8. However, since short range contacts tend to be exposed and they are frequently formed in misfolded structures, selection for folding stability tends to make them less attractive, that is, stability and kinetics may have contrasting requirements. Using a recently proposed model, we predicted the relationship between contact range and contact energy based on buriedness and contact frequency. Deviations from this prediction induce a positive correlation between contact range and contact energy, that is, short range contacts are stronger than expected, for 2/3 of the proteins. This correlation increases with the absolute contact order (ACO), as expected if proteins that tend to fold slowly due to large ACO are subject to stronger selection for sequence features favoring fast folding. Our results suggest that the selective pressure for fast folding is detectable only for one third of the proteins in the PDB, in particular those with large contact order.  相似文献   

12.
J Hargbo  A Elofsson 《Proteins》1999,36(1):68-76
There are many proteins that share the same fold but have no clear sequence similarity. To predict the structure of these proteins, so called "protein fold recognition methods" have been developed. During the last few years, improvements of protein fold recognition methods have been achieved through the use of predicted secondary structures (Rice and Eisenberg, J Mol Biol 1997;267:1026-1038), as well as by using multiple sequence alignments in the form of hidden Markov models (HMM) (Karplus et al., Proteins Suppl 1997;1:134-139). To test the performance of different fold recognition methods, we have developed a rigorous benchmark where representatives for all proteins of known structure are matched against each other. Using this benchmark, we have compared the performance of automatically-created hidden Markov models with standard-sequence-search methods. Further, we combine the use of predicted secondary structures and multiple sequence alignments into a combined method that performs better than methods that do not use this combination of information. Using only single sequences, the correct fold of a protein was detected for 10% of the test cases in our benchmark. Including multiple sequence information increased this number to 16%, and when predicted secondary structure information was included as well, the fold was correctly identified in 20% of the cases. Moreover, if the correct secondary structure was used, 27% of the proteins could be correctly matched to a fold. For comparison, blast2, fasta, and ssearch identifies the fold correctly in 13-17% of the cases. Thus, standard pairwise sequence search methods perform almost as well as hidden Markov models in our benchmark. This is probably because the automatically-created multiple sequence alignments used in this study do not contain enough diversity and because the current generation of hidden Markov models do not perform very well when built from a few sequences.  相似文献   

13.
Vicatos S  Kaznessis YN 《Proteins》2008,70(2):539-552
We present a method that significantly improves the accuracy of predicted proximal residue pairs in protein molecules. Computational methods for predicting pairs of amino acids that are distant in the protein sequence but close in the protein 3D structure can benefit attempts to in silico recognize the fold of a protein molecule. Unfortunately, currently available methods suffer from low predictive accuracy. In this work, we use Monte Carlo simulations to fold protein molecules with proximal pair predictions used as additional energy constraints. To test our methods, we study molecules with known tertiary structures. With Monte Carlo, we generate ensembles of structures for each set of residues constraints. The distribution of the root mean square deviation of the folded structures from the known native structure reveals clear information about the accuracy of the constraint sets used. With recursive substitutions of constraints, false positive predictions are identified and filtered out and significant improvements in accuracy are observed.  相似文献   

14.
We proposed recently an optimization method to derive energy parameters for simplified models of protein folding. The method is based on the maximization of the thermodynamic average of the overlap between protein native structures and a Boltzmann ensemble of alternative structures. Such a condition enforces protein models whose ground states are most similar to the corresponding native states. We present here an extensive testing of the method for a simple residue-residue contact energy function and for alternative structures generated by threading. The optimized energy function guarantees high stability and a well-correlated energy landscape to most representative structures in the PDB database. Failures in the recognition of the native structure can be attributed to the neglect of interactions between different chains in oligomeric proteins or with cofactors. When these are taken into account, only very few X-ray structures are not recognized. Most of them are short inhibitors or fragments and one is a structure that presents serious inconsistencies. Finally, we discuss the reasons that make NMR structures more difficult to recognizeCopyright 2001 Wiley-Liss, Inc.  相似文献   

15.
In this article, we describe a novel methodology to extract semantic characteristics from protein structures using linear algebra in order to compose structural signature vectors which may be used efficiently to compare and classify protein structures into fold families. These signatures are built from the pattern of hydrophobic intrachain interactions using Singular Value Decomposition (SVD) and Latent Semantic Indexing (LSI) techniques. Considering proteins as documents and contacts as terms, we have built a retrieval system which is able to find conserved contacts in samples of myoglobin fold family and to retrieve these proteins among proteins of varied folds with precision of up to 80%. The classifier is a web tool available at our laboratory website. Users can search for similar chains from a specific PDB, view and compare their contact maps and browse their structures using a JMol plug-in.  相似文献   

16.
Protein sequences have evolved to fold into functional structures, resulting in families of diverse protein sequences that all share the same overall fold. One can harness protein family sequence data to infer likely contacts between pairs of residues. In the current study, we combine this kind of inference from coevolutionary information with a coarse‐grained protein force field ordinarily used with single sequence input, the Associative memory, Water mediated, Structure and Energy Model (AWSEM), to achieve improved structure prediction. The resulting Associative memory, Water mediated, Structure and Energy Model with Evolutionary Restraints (AWSEM‐ER) yields a significant improvement in the quality of protein structure prediction over the single sequence prediction from AWSEM when a sufficiently large number of homologous sequences are available. Free energy landscape analysis shows that the addition of the evolutionary term shifts the free energy minimum to more native‐like structures, which explains the improvement in the quality of structures when performing predictions using simulated annealing. Simulations using AWSEM without coevolutionary information have proved useful in elucidating not only protein folding behavior, but also mechanisms of protein function. The success of AWSEM‐ER in de novo structure prediction suggests that the enhanced model opens the door to functional studies of proteins even when no experimentally solved structures are available.  相似文献   

17.
One of the main barriers to accurate computational protein structure prediction is searching the vast space of protein conformations. Distance restraints or inter‐residue contacts have been used to reduce this search space, easing the discovery of the correct folded state. It has been suggested that about 1 contact for every 12 residues may be sufficient to predict structure at fold level accuracy. Here, we use coarse‐grained structure‐based models in conjunction with molecular dynamics simulations to examine this empirical prediction. We generate sparse contact maps for 15 proteins of varying sequence lengths and topologies and find that given perfect secondary‐structural information, a small fraction of the native contact map (5%‐10%) suffices to fold proteins to their correct native states. We also find that different sparse maps are not equivalent and we make several observations about the type of maps that are successful at such structure prediction. Long range contacts are found to encode more information than shorter range ones, especially for α and αβ‐proteins. However, this distinction reduces for β‐proteins. Choosing contacts that are a consensus from successful maps gives predictive sparse maps as does choosing contacts that are well spread out over the protein structure. Additionally, the folding of proteins can also be used to choose predictive sparse maps. Overall, we conclude that structure‐based models can be used to understand the efficacy of structure‐prediction restraints and could, in future, be tuned to include specific force‐field interactions, secondary structure errors and noise in the sparse maps.  相似文献   

18.
Weitao Sun  Jing He 《Proteins》2009,77(1):159-173
Secondary structure topology in this article refers to the order and the direction of the secondary structures, such as helices and strands, with respect to the protein sequence. Even when the locations of the secondary structure Cα atoms are known, there are still (N!2N)(M!2M) different possible topologies for a protein with N helices and M strands. This work explored the question if the native topology is likely to be identified among a large set of all possible geometrically constrained topologies through an evaluation of the residue contact energy formed by the secondary structures, instead of the entire chain. We developed a contact pair specific and distance specific multiwell function based on the statistical characterization of the side chain distances of 413 proteins in the Protein Data Bank. The multiwell function has specific parameters to each of the 210 pairs of residue contacts. We illustrated a general mathematical method to extend a single well function to a multiwell function to represent the statistical data. We have performed a mutation analysis using 50 proteins to generate all the possible geometrically constrained topologies of the secondary structures. The result shows that the native topology is within the top 25% of the list ranked by the effective contact energies of the secondary structures for all the 50 proteins, and is within the top 5% for 34 proteins. As an application, the method was used to derive the structure of the skeletons from a low resolution density map that can be obtained through electron cryomicroscopy. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

19.
A new potential energy function representing the conformational preferences of sequentially local regions of a protein backbone is presented. This potential is derived from secondary structure probabilities such as those produced by neural network-based prediction methods. The potential is applied to the problem of remote homolog identification, in combination with a distance-dependent inter-residue potential and position-based scoring matrices. This fold recognition jury is implemented in a Java application called JThread. These methods are benchmarked on several test sets, including one released entirely after development and parameterization of JThread. In benchmark tests to identify known folds structurally similar to (but not identical with) the native structure of a sequence, JThread performs significantly better than PSI-BLAST, with 10% more structures identified correctly as the most likely structural match in a fold library, and 20% more structures correctly narrowed down to a set of five possible candidates. JThread also improves the average sequence alignment accuracy significantly, from 53% to 62% of residues aligned correctly. Reliable fold assignments and alignments are identified, making the method useful for genome annotation. JThread is applied to predicted open reading frames (ORFs) from the genomes of Mycoplasma genitalium and Drosophila melanogaster, identifying 20 new structural annotations in the former and 801 in the latter.  相似文献   

20.
Folding rates of small single-domain proteins that fold through simple two-state kinetics can be estimated from details of the three-dimensional protein structure. Previously, predictions of secondary structure had been exploited to predict folding rates from sequence. Here, we estimate two-state folding rates from predictions of internal residue-residue contacts in proteins of unknown structure. Our estimate is based on the correlation between the folding rate and the number of predicted long-range contacts normalized by the square of the protein length. It is well known that long-range order derived from known structures correlates with folding rates. The surprise was that estimates based on very noisy contact predictions were almost as accurate as the estimates based on known contacts. On average, our estimates were similar to those previously published from secondary structure predictions. The combination of these methods that exploit different sources of information improved performance. It appeared that the combined method reliably distinguished fast from slow two-state folders.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号