首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The Z-score of a protein is defined as the energy separation between the native fold and the average of an ensemble of misfolds in the units of the standard deviation of the ensemble. The Z-score is often used as a way of testing the knowledge-based potentials for their ability to recognize the native fold from other alternatives. However, it is not known what range of values the Z-scores should have if one had a correct potential. Here, we offer an estimate of Z-scores extracted from calorimetric measurements of proteins. The energies obtained from these experimental data are compared with those from computer simulations of a lattice model protein. It is suggested that the Z-scores calculated from different knowledge-based potentials are generally too small in comparison with the experimental values.  相似文献   

2.
Knowledge-based potentials are energy functions derived from the analysis of databases of protein structures and sequences. They can be divided into two classes. Potentials from the first class are based on a direct conversion of the distributions of some geometric properties observed in native protein structures into energy values, while potentials from the second class are trained to mimic quantitatively the geometric differences between incorrectly folded models and native structures. In this paper, we focus on the relationship between energy and geometry when training the second class of knowledge-based potentials. We assume that the difference in energy between a decoy structure and the corresponding native structure is linearly related to the distance between the two structures. We trained two distance-based knowledge-based potentials accordingly, one based on all inter-residue distances (PPD), while the other had the set of all distances filtered to reflect consistency in an ensemble of decoys (PPE). We tested four types of metric to characterize the distance between the decoy and the native structure, two based on extrinsic geometry (RMSD and GTD-TS*), and two based on intrinsic geometry (Q* and MT). The corresponding eight potentials were tested on a large collection of decoy sets. We found that it is usually better to train a potential using an intrinsic distance measure. We also found that PPE outperforms PPD, emphasizing the benefits of capturing consistent information in an ensemble. The relevance of these results for the design of knowledge-based potentials is discussed.  相似文献   

3.
We have developed a new combined approach for ab initio protein structure prediction. The protein conformation is described as a lattice chain connecting C(alpha) atoms, with attached C(beta) atoms and side-chain centers of mass. The model force field includes various short-range and long-range knowledge-based potentials derived from a statistical analysis of the regularities of protein structures. The combination of these energy terms is optimized through the maximization of correlation for 30 x 60,000 decoys between the root mean square deviation (RMSD) to native and energies, as well as the energy gap between native and the decoy ensemble. To accelerate the conformational search, a newly developed parallel hyperbolic sampling algorithm with a composite movement set is used in the Monte Carlo simulation processes. We exploit this strategy to successfully fold 41/100 small proteins (36 approximately 120 residues) with predicted structures having a RMSD from native below 6.5 A in the top five cluster centroids. To fold larger-size proteins as well as to improve the folding yield of small proteins, we incorporate into the basic force field side-chain contact predictions from our threading program PROSPECTOR where homologous proteins were excluded from the data base. With these threading-based restraints, the program can fold 83/125 test proteins (36 approximately 174 residues) with structures having a RMSD to native below 6.5 A in the top five cluster centroids. This shows the significant improvement of folding by using predicted tertiary restraints, especially when the accuracy of side-chain contact prediction is >20%. For native fold selection, we introduce quantities dependent on the cluster density and the combination of energy and free energy, which show a higher discriminative power to select the native structure than the previously used cluster energy or cluster size, and which can be used in native structure identification in blind simulations. These procedures are readily automated and are being implemented on a genomic scale.  相似文献   

4.
Betancourt MR 《Proteins》2003,53(4):889-907
A protein model that is simple enough to be used in protein-folding simulations but accurate enough to identify a protein native fold is described. Its geometry consists of describing the residues by one, two, or three pseudoatoms, depending on the residue size. Its energy is given by a pairwise, knowledge-based potential obtained for all the pseudoatoms as a function of their relative distance. The pseudoatomic potential is also a function of the primary chain separation and residue order. The model is tested by gapless threading on a large, representative set of known protein and decoy structures obtained from the "Decoys 'R' Us" database. It is also tested by threading on gapped decoys generated for proteins with many homologs. The gapless threading tests show near 98% native-structure recognition as the lowest energy structure and almost 100% as one of the three lowest energy structures for over 2200 test proteins. In decoy threading tests, the model recognized the majority of the native structures. It is also able to recognize native structures among gapped decoys, in spite of close structural similarities. The results indicate that the pseudoatomic model has native recognition ability similar to comparable atomic-based models but much better than equivalent residue-based models.  相似文献   

5.
Protein structure prediction encompasses two major challenges: 1), the generation of a large ensemble of high resolution structures for a given amino-acid sequence; and 2), the identification of the structure closest to the native structure for a blind prediction. In this article, we address the second challenge, by proposing what is, to our knowledge, a novel iterative traveling-salesman problem-based clustering method to identify the structures of a protein, in a given ensemble, which are closest to the native structure. The method consists of an iterative procedure, which aims at eliminating clusters of structures at each iteration, which are unlikely to be of similar fold to the native, based on a statistical analysis of cluster density and average spherical radius. The method, denoted as ICON, has been tested on four data sets: 1), 1400 proteins with high resolution decoys; 2), medium-to-low resolution decoys from Decoys ‘R’ Us; 3), medium-to-low resolution decoys from the first-principles approach, ASTRO-FOLD; and 4), selected targets from CASP8. The extensive tests demonstrate that ICON can identify high-quality structures in each ensemble, regardless of the resolution of conformers. In a total of 1454 proteins, with an average of 1051 conformers per protein, the conformers selected by ICON are, on an average, in the top 3.5% of the conformers in the ensemble.  相似文献   

6.
M J Sippl  S Weitckus 《Proteins》1992,13(3):258-271
We present an approach which can be used to identify native-like folds in a data base of protein conformations in the absence of any sequence homology to proteins in the data base. The method is based on a knowledge-based force field derived from a set of known protein conformations. A given sequence is mounted on all conformations in the data base and the associated energies are calculated. Using several conformations and sequences from the globin family we show that the native conformation is identified correctly. In fact the resolution of the force field is high enough to discriminate between a native fold and several closely related conformations. We then apply the procedure to several globins of known sequence but unknown three dimensional structure. The homology of these sequences to globins of known structures in the data base ranges from 49 to 17%. With one exception we find that for all globin sequences one of the known globin folds is identified as the most favorable conformation. These results are obtained using a force field derived from a data base devoid of globins of known structure. We briefly discuss useful applications in protein structural research and future development of our approach.  相似文献   

7.
Recent experimental results suggest that the native fold, or topology, plays a primary role in determining the structure of the transition state ensemble, at least for small, fast-folding proteins. To investigate the extent of the topological control of the folding process, we studied the folding of simplified models of five small globular proteins constructed using a Go-like potential to retain the information about the native structures but drastically reduce the energetic frustration and energetic heterogeneity among residue-residue native interactions. By comparing the structure of the transition state ensemble (experimentally determined by Phi-values) and of the intermediates with those obtained using our models, we show that these energetically unfrustrated models can reproduce the global experimentally known features of the transition state ensembles and "en-route" intermediates, at least for the analyzed proteins. This result clearly indicates that, as long as the protein sequence is sufficiently minimally frustrated, topology plays a central role in determining the folding mechanism.  相似文献   

8.
Many cellular functions rely on interactions between protein pairs and higher oligomers. We have recently shown that binding mechanisms are robust and owing to the minimal frustration principle, just as for protein folding, are governed primarily by the protein's native topology, which is characterized by the network of non-covalent residue-residue interactions. The detailed binding mechanisms of nine dimers, a trimer, and a tetramer, each involving different degrees of flexibility and plasticity during assembly, are surveyed here using a model that is based solely on the protein topology, having a perfectly funneled energy landscape. The importance of flexibility in binding reactions is manifested by the fly-casting effect, which is diminished in magnitude when protein flexibility is removed. Many of the grosser and finer structural aspects of the various binding mechanisms (including binding of pre-folded monomers, binding of intrinsically unfolded monomers, and binding by domain-swapping) predicted by the native topology based landscape model are consistent with the mechanisms found in the laboratory. An asymmetric binding mechanism is often observed for the formation of the symmetric homodimers where one monomer is more structured at the binding transition state and serves as a template for the folding of the other monomer. Phi values were calculated to show how the structure of the binding transition state ensemble would be manifested in protein engineering studies. For most systems, the simulated Phi values are reasonably correlated with the available experimental values. This agreement suggests that the overall binding mechanism and the nature of the binding transition state ensemble can be understood from the network of interactions that stabilize the native fold. The Phi values for the formation of an antibody-antigen complex indicate a possible role for solvation of the interface in biomolecular association of large rigid proteins.  相似文献   

9.
S Miyazawa  R L Jernigan 《Proteins》1999,36(3):357-369
We consider modifications of an empirical energy potential for fold and sequence recognition to represent approximately the stabilities of proteins in various environments. A potential used here includes a secondary structure potential representing short-range interactions for secondary structures of proteins, and a tertiary structure potential consisting of a long-range, pairwise contact potential and a repulsive packing potential. This potential is devised to evaluate together the total conformational energy of a protein at the coarse grained residue level. It was previously estimated from the observed frequencies of secondary structures, from contact frequencies between residues, and from the distributions of the number of residues in contact in known protein structures by regarding those distributions as the equilibrium distributions with the Boltzmann factor of these interaction energies. The stability of native structures is assumed as a primary requirement for proteins to fold into their native structures. A collapse energy is subtracted from the contact energies to remove the protein size dependence and to represent protein stabilities for monomeric and multimeric states. The free energy of the whole ensemble of protein conformations that is subtracted from the conformational energy to represent protein stability is approximated as the average energy expected for a typical native structure with the same amino acid composition. This term may be constant in fold recognition but essentially varies in sequence recognition. A simple test of threading sequences into structures without gaps is employed to demonstrate the importance of the present modifications that permit the same potential to be utilized for both fold and sequence recognition. Proteins 1999;36:357-369. Published 1999 Wiley-Liss, Inc.  相似文献   

10.
With the help of the crystal structure of rhodopsin an ab initio method has been developed to calculate the three-dimensional structure of the loops that connect the transmembrane helices (TMHs). The goal of this procedure is to calculate the loop structures in other G-protein coupled receptors (GPCRs) for which only model coordinates of the TMHs are available. To mimic this situation a construct of rhodopsin was used that only includes the experimental coordinates of the TMHs while the rest of the structure, including the terminal domains, has been removed. To calculate the structure of the loops a method was designed based on Monte Carlo (MC) simulations which use a temperature annealing protocol, and a scaled collective variables (SCV) technique with proper structural constraints. Because only part of the protein is used in the calculations the usual approach of modeling loops, which consists of finding a single, lowest energy conformation of the system, is abandoned because such a single structure may not be a representative member of the native ensemble. Instead, the method was designed to generate structural ensembles from which the single lowest free energy ensemble is identified as representative of the native folding of the loop. To find the native ensemble a successive series of SCV-MC simulations are carried out to allow the loops to undergo structural changes in a controlled manner. To increase the chances of finding the native funnel for the loop, some of the SCV-MC simulations are carried out at elevated temperatures. The native ensemble can be identified by an MC search starting from any conformation already in the native funnel. The hypothesis is that native structures are trapped in the conformational space because of the high-energy barriers that surround the native funnel. The existence of such ensembles is demonstrated by generating multiple copies of the loops from their crystal structures in rhodopsin and carrying out an extended SCV-MC search. For the extracellular loops e1 and e3, and the intracellular loop i1 that were used in this work, the procedure resulted in dense clusters of structures with Calpha-RMSD approximately 0.5 angstroms. To test the predictive power of the method the crystal structure of each loop was replaced by its extended conformations. For e1 and i1 the procedure identifies native clusters with Calpha-RMSD approximately 0.5 angstroms and good structural overlap of the side chains; for e3, two clusters were found with Calpha-RMSD approximately 1.1 angstroms each, but with poor overlap of the side chains. Further searching led to a single cluster with lower Calpha-RMSD but higher energy than the two previous clusters. This discrepancy was found to be due to the missing elements in the constructs available from experiment for use in the calculations. Because this problem will likely appear whenever parts of the structural information are missing, possible solutions are discussed.  相似文献   

11.
It is well established that protein structures are more conserved than protein sequences. One-third of all known protein structures can be classified into ten protein folds, which themselves are composed mainly of alpha-helical hairpin, beta hairpin, and betaalphabeta supersecondary structural elements. In this study, we explore the ability of a recent Monte Carlo-based procedure to generate the 3D structures of eight polypeptides that correspond to units of supersecondary structure and three-stranded antiparallel beta sheet. Starting from extended or misfolded compact conformations, all Monte Carlo simulations show significant success in predicting the native topology using a simplified chain representation and an energy model optimized on other structures. Preliminary results on model peptides from nucleotide binding proteins suggest that this simple protein folding model can help clarify the relation between sequence and topology.  相似文献   

12.
For successful ab initio protein structure prediction, a method is needed to identify native-like structures from a set containing both native and non-native protein-like conformations. In this regard, the use of distance geometry has shown promise when accurate inter-residue distances are available. We describe a method by which distance geometry restraints are culled from sets of 500 protein-like conformations for four small helical proteins generated by the method of Simons et al. (1997). A consensus-based approach was applied in which every inter-Calpha distance was measured, and the most frequently occurring distances were used as input restraints for distance geometry. For each protein, a structure with lower coordinate root-mean-square (RMS) error than the mean of the original set was constructed; in three cases the topology of the fold resembled that of the native protein. When the fold sets were filtered for the best scoring conformations with respect to an all-atom knowledge-based scoring function, the remaining subset of 50 structures yielded restraints of higher accuracy. A second round of distance geometry using these restraints resulted in an average coordinate RMS error of 4.38 A.  相似文献   

13.
We describe the derivation and testing of a knowledge-based atomic environment potential for the modeling of protein structural energetics. An analysis of the probabilities of atomic interactions in a dataset of high-resolution protein structures shows that the probabilities of non-bonded inter-atomic contacts are not statistically independent events, and that the multi-body contact frequencies are poorly predicted from pairwise contact potentials. A pseudo-energy function is defined that measures the preferences for protein atoms to be in a given microenvironment defined by the number of contacting atoms in the environment and its atomic composition. This functional form is tested for its ability to recognize native protein structures amongst an ensemble of decoy structures and a detailed relative performance comparison is made with a number of common functions used in protein structure prediction.  相似文献   

14.
《Proteins》2018,86(5):501-514
The structural variations of multidomain proteins with flexible parts mediate many biological processes, and a structure ensemble can be determined by selecting a weighted combination of representative structures from a simulated structure pool, producing the best fit to experimental constraints such as interatomic distance. In this study, a hybrid structure‐based and physics‐based atomistic force field with an efficient sampling strategy is adopted to simulate a model di‐domain protein against experimental paramagnetic relaxation enhancement (PRE) data that correspond to distance constraints. The molecular dynamics simulations produce a wide range of conformations depicted on a protein energy landscape. Subsequently, a conformational ensemble recovered with low‐energy structures and the minimum‐size restraint is identified in good agreement with experimental PRE rates, and the result is also supported by chemical shift perturbations and small‐angle X‐ray scattering data. It is illustrated that the regularizations of energy and ensemble‐size prevent an arbitrary interpretation of protein conformations. Moreover, energy is found to serve as a critical control to refine the structure pool and prevent data overfitting, because the absence of energy regularization exposes ensemble construction to the noise from high‐energy structures and causes a more ambiguous representation of protein conformations. Finally, we perform structure‐ensemble optimizations with a topology‐based structure pool, to enhance the understanding on the ensemble results from different sources of pool candidates.  相似文献   

15.
The mechanism by which proteins fold to their native states has been the focus of intense research in recent years. The rate-limiting event in the folding reaction is the formation of a conformation in a set known as the transition-state ensemble. The structural features present within such ensembles have now been analysed for a series of proteins using data from a combination of biochemical and biophysical experiments together with computer-simulation methods. These studies show that the topology of the transition state is determined by a set of interactions involving a small number of key residues and, in addition, that the topology of the transition state is closer to that of the native state than to that of any other fold in the protein universe. Here, we review the evidence for these conclusions and suggest a molecular mechanism that rationalizes these findings by presenting a view of protein folds that is based on the topological features of the polypeptide backbone, rather than the conventional view that depends on the arrangement of different types of secondary-structure elements. By linking the folding process to the organization of the protein structure universe, we propose an explanation for the overwhelming importance of topology in the transition states for protein folding.  相似文献   

16.
Cooperative unfolding penalties are calculated by statistically evaluating an ensemble of denatured states derived from native structures. The ensemble of denatured states is determined by dividing the native protein into short contiguous segments and defining all possible combinations of native, i.e., interacting, and non-native, i.e., non-interacting, segments. We use a novel knowledge-based scoring function, derived from a set of non-homologous proteins in the Protein Data Bank, to describe the interactions among residues. This procedure is used for the structural identification of cooperative folding cores for four globular proteins: bovine pancreatic trypsin inhibitor, horse heart cytochrome c, French bean plastocyanin, and staphylococcal nuclease. The theoretical folding units are shown to correspond to regions that exhibit enhanced stability against denaturation as determined from experimental hydrogen exchange protection factors. Using a sequence similarity score for related sequences, we show that, in addition to residues necessary for enzymatic function, those amino acids comprising structurally important folding cores are also preferentially conserved during evolution. This implies that the identified folding cores may be part of an array of fundamental structural folding units.  相似文献   

17.
According to the thermodynamic hypothesis, the native state of proteins is that in which the free energy of the system is at its lowest, so that at normal temperature and pressure, proteins evolve to that state. We selected four proteins representative of each of the four classes, and for each protein make four simulations, one starting from the native structure and the other three starting from the structure obtained by threading the sequence of one protein onto the native backbone fold of the other three proteins. Because of their large conformational distances with respect to the native structure, the three alternative initial structures cannot be considered as local minima within the native ensemble of the corresponding protein. As expected, the initial native states are preserved in the .5?μs simulations performed here and validate the simulations. On the other hand, when the initial state is not native, an analysis of the trajectories does not reveal any evolution towards the native state, during that time. These results indicate that the distribution of protein conformations is multipeak shaped, so that apart from the peak corresponding to the native state, there are other peaks associated with average structures that are very different from the native and that can last as long as the native state.  相似文献   

18.
19.
Weitao Sun  Jing He 《Proteins》2009,77(1):159-173
Secondary structure topology in this article refers to the order and the direction of the secondary structures, such as helices and strands, with respect to the protein sequence. Even when the locations of the secondary structure Cα atoms are known, there are still (N!2N)(M!2M) different possible topologies for a protein with N helices and M strands. This work explored the question if the native topology is likely to be identified among a large set of all possible geometrically constrained topologies through an evaluation of the residue contact energy formed by the secondary structures, instead of the entire chain. We developed a contact pair specific and distance specific multiwell function based on the statistical characterization of the side chain distances of 413 proteins in the Protein Data Bank. The multiwell function has specific parameters to each of the 210 pairs of residue contacts. We illustrated a general mathematical method to extend a single well function to a multiwell function to represent the statistical data. We have performed a mutation analysis using 50 proteins to generate all the possible geometrically constrained topologies of the secondary structures. The result shows that the native topology is within the top 25% of the list ranked by the effective contact energies of the secondary structures for all the 50 proteins, and is within the top 5% for 34 proteins. As an application, the method was used to derive the structure of the skeletons from a low resolution density map that can be obtained through electron cryomicroscopy. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

20.
Scoring functions are widely used in the final step of model selection in protein structure prediction. This is of interest both for comparative modeling targets, where it is important to select the best model among a set of many good, "correct" ones, as well as for other (fold recognition or novel fold) targets, where the set may contain many incorrect models. A novel combination of four knowledge-based potentials recognizing different features of native protein structures is introduced and tested. The pairwise, solvation, hydrogen bond, and torsion angle potentials contain largely orthogonal information. Of these, the torsion angle potential is found to show the strongest correlation with model quality. Combining these features with a linear weighting function, it was possible to construct a robust energy function capable of discriminating native-like structures on several benchmarking sets. In a recent blind test (CAFASP-4 MQAP), the scoring function ranked consistently well and was able to reliably distinguish the correct template from an ensemble of high quality decoys in 52 of 70 cases (33 of 34 for comparative modeling). An executable version of the Victor/FRST function for Linux PCs is available for download from the URL http://protein.cribi.unipd.it/frst/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号