首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Qiu J  Sheffler W  Baker D  Noble WS 《Proteins》2008,71(3):1175-1182
Protein structure prediction is an important problem of both intellectual and practical interest. Most protein structure prediction approaches generate multiple candidate models first, and then use a scoring function to select the best model among these candidates. In this work, we develop a scoring function using support vector regression (SVR). Both consensus-based features and features from individual structures are extracted from a training data set containing native protein structures and predicted structural models submitted to CASP5 and CASP6. The SVR learns a scoring function that is a linear combination of these features. We test this scoring function on two data sets. First, when used to rank server models submitted to CASP7, the SVR score selects predictions that are comparable to the best performing server in CASP7, Zhang-Server, and significantly better than all the other servers. Even if the SVR score is not allowed to select Zhang-Server models, the SVR score still selects predictions that are significantly better than all the other servers. In addition, the SVR is able to select significantly better models and yield significantly better Pearson correlation coefficients than the two best Quality Assessment groups in CASP7, QA556 (LEE), and QA634 (Pcons). Second, this work aims to improve the ability of the Robetta server to select best models, and hence we evaluate the performance of the SVR score on ranking the Robetta server template-based models for the CASP7 targets. The SVR selects significantly better models than the Robetta K*Sync consensus alignment score.  相似文献   

2.
We report here a multiprotein blind test of a computer method to predict native protein structures based solely on an all-atom physics-based force field. We use the AMBER 96 potential function with an implicit (GB/SA) model of solvation, combined with replica-exchange molecular-dynamics simulations. Coarse conformational sampling is performed using the zipping and assembly method (ZAM), an approach that is designed to mimic the putative physical routes of protein folding. ZAM was applied to the folding of six proteins, from 76 to 112 monomers in length, in CASP7, a community-wide blind test of protein structure prediction. Because these predictions have about the same level of accuracy as typical bioinformatics methods, and do not utilize information from databases of known native structures, this work opens up the possibility of predicting the structures of membrane proteins, synthetic peptides, or other foldable polymers, for which there is little prior knowledge of native structures. This approach may also be useful for predicting physical protein folding routes, non-native conformations, and other physical properties from amino acid sequences.  相似文献   

3.
Soto CS  Fasnacht M  Zhu J  Forrest L  Honig B 《Proteins》2008,70(3):834-843
We describe a fast and accurate protocol, LoopBuilder, for the prediction of loop conformations in proteins. The procedure includes extensive sampling of backbone conformations, side chain addition, the use of a statistical potential to select a subset of these conformations, and, finally, an energy minimization and ranking with an all-atom force field. We find that the Direct Tweak algorithm used in the previously developed LOOPY program is successful in generating an ensemble of conformations that on average are closer to the native conformation than those generated by other methods. An important feature of Direct Tweak is that it checks for interactions between the loop and the rest of the protein during the loop closure process. DFIRE is found to be a particularly effective statistical potential that can bias conformation space toward conformations that are close to the native structure. Its application as a filter prior to a full molecular mechanics energy minimization both improves prediction accuracy and offers a significant savings in computer time. Final scoring is based on the OPLS/SBG-NP force field implemented in the PLOP program. The approach is also shown to be quite successful in predicting loop conformations for cases where the native side chain conformations are assumed to be unknown, suggesting that it will prove effective in real homology modeling applications.  相似文献   

4.
A 3D model of RNA structure can provide information about its function and regulation that is not possible with just the sequence or secondary structure. Current models suffer from low accuracy and long running times and either neglect or presume knowledge of the long-range interactions which stabilize the tertiary structure. Our coarse-grained, helix-based, tertiary structure model operates with only a few degrees of freedom compared with all-atom models while preserving the ability to sample tertiary structures given a secondary structure. It strikes a balance between the precision of an all-atom tertiary structure model and the simplicity and effectiveness of a secondary structure representation. It provides a simplified tool for exploring global arrangements of helices and loops within RNA structures. We provide an example of a novel energy function relying only on the positions of stems and loops. We show that coupling our model to this energy function produces predictions as good as or better than the current state of the art tools. We propose that given the wide range of conformational space that needs to be explored, a coarse-grain approach can explore more conformations in less iterations than an all-atom model coupled to a fine-grain energy function. Finally, we emphasize the overarching theme of providing an ensemble of predicted structures, something which our tool excels at, rather than providing a handful of the lowest energy structures.  相似文献   

5.
Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. Current approaches, such as the protein local optimization protocol or kinematic inversion closure (KIC) Monte Carlo, involve stages that coarse-grain proteins, simplifying modeling but precluding a systematic search of all-atom configurations. This article introduces an alternative modeling strategy based on a ‘stepwise ansatz’, recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA) protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth ‘RNA-puzzle’ competition. These results establish all-atom enumeration as an unusually systematic approach to ab initio protein structure modeling that can leverage high performance computing and physically realistic energy functions to more consistently achieve atomic accuracy.  相似文献   

6.
Rohl CA  Strauss CE  Chivian D  Baker D 《Proteins》2004,55(3):656-677
A major limitation of current comparative modeling methods is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled. Because structural differences between homologous proteins are responsible for variations in protein function and specificity, the ability to model these differences has important functional consequences. Although existing methods can provide reasonably accurate models of short loop regions, modeling longer structurally divergent regions is an unsolved problem. Here we describe a method based on the de novo structure prediction algorithm, Rosetta, for predicting conformations of structurally divergent regions in comparative models. Initial conformations for short segments are selected from the protein structure database, whereas longer segments are built up by using three- and nine-residue fragments drawn from the database and combined by using the Rosetta algorithm. A gap closure term in the potential in combination with modified Newton's method for gradient descent minimization is used to ensure continuity of the peptide backbone. Conformations of variable regions are refined in the context of a fixed template structure using Monte Carlo minimization together with rapid repacking of side-chains to iteratively optimize backbone torsion angles and side-chain rotamers. For short loops, mean accuracies of 0.69, 1.45, and 3.62 A are obtained for 4, 8, and 12 residue loops, respectively. In addition, the method can provide reasonable models of conformations of longer protein segments: predicted conformations of 3A root-mean-square deviation or better were obtained for 5 of 10 examples of segments ranging from 13 to 34 residues. In combination with a sequence alignment algorithm, this method generates complete, ungapped models of protein structures, including regions both similar to and divergent from a homologous structure. This combined method was used to make predictions for 28 protein domains in the Critical Assessment of Protein Structure 4 (CASP 4) and 59 domains in CASP 5, where the method ranked highly among comparative modeling and fold recognition methods. Model accuracy in these blind predictions is dominated by alignment quality, but in the context of accurate alignments, long protein segments can be accurately modeled. Notably, the method correctly predicted the local structure of a 39-residue insertion into a TIM barrel in CASP 5 target T0186.  相似文献   

7.
8.
In protein tertiary structure prediction, a crucial step is to select near-native structures from a large number of predicted structural models. Over the years, extensive research has been conducted for the protein structure selection problem with most approaches focusing on developing more accurate energy or scoring functions. Despite significant advances in this area, the discerning power of current approaches is still unsatisfactory. In this paper, we propose a novel consensus-based algorithm for the selection of predicted protein structures. Given a set of predicted models, our method first removes redundant structures to derive a subset of reference models. Then, a structure is ranked based on its average pairwise similarity to the reference models. Using the CASP8 data set containing a large collection of predicted models for 122 targets, we compared our method with the best CASP8 quality assessment (QA) servers, which are all consensus based, and showed that our QA scores correlate better with the GDT-TSs than those of the CASP8 QA servers. We also compared our method with the state-of-the-art scoring functions and showed its improved performance for near-native model selection. The GDT-TSs of the top models picked by our method are on average more than 8 percent better than the ones selected by the best performing scoring function.  相似文献   

9.
We present our assessment of tertiary structure predictions for hard targets in Critical Assessment of Structure Prediction round 13 (CASP13). The analysis includes (a) assignment and discussion of best models through scores-aided visual inspection of models for each evaluation unit (EU); (b) ranking of predictors resulting from this evaluation and from global scores; and (c) evaluation of progress, state of the art, and current limitations of protein structure prediction. We witness a sizable improvement in tertiary structure prediction building on the progress observed from CASP11 to CASP12, with (a) top models reaching backbone RMSD <3 å for several EUs of size <150 residues, contributed by many groups; (b) at least one model that roughly captures global topology for all EUs, probably unprecedented in this track of CASP; and (c) even quite good models for full, unsplit targets. Better structure predictions are brought about mainly by improved residue-residue contact predictions, and since this CASP also by distance predictions, achieved through state-of-the-art machine learning methods which also progressed to work with slightly shallower alignments compared to CASP12. As we reach a new realm of tertiary structure prediction quality, new directions are proposed and explored for future CASPs: (a) dropping splitting into EUs, (b) rethinking difficulty metrics probably in terms of contact and distance predictions, (c) assessing also side chains for models of high backbone accuracy, and (d) assessing residue-wise and possibly residue-residue quality estimates.  相似文献   

10.
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.  相似文献   

11.
We develop a procedure for exploring the free energy landscape of protein-peptide binding at atomic detail and apply it to PDZ domain-peptide interactions. The procedure involves soft constraints on receptor proteins providing limited chain flexibility, including backbone motions. Peptide chains are left fully flexible and kept in spatial proximity of the protein through periodic boundary conditions. By extensive Monte Carlo simulations, full representative conformational ensembles at temperatures where bound and unbound states coexist are obtained. To make this approach computationally feasible, we develop an effective all-atom energy function centering on hydrophobicity, hydrogen bonding, and electrostatic interactions. Our initial focus is a set of 11 PDZ domain-peptide pairs with experimentally determined complex structures. Minimum-energy conformations are found to be highly similar to the respective native structures in eight of the cases (all-atom peptide RMSDs < 6 Å). Having achieved that, we turn to a more complete characterization of the bound peptide state through a clustering scheme applied on the full ensembles of peptide structures. We find a significant diversity among bound peptide conformations for several PDZ domains, in particular involving the N terminal side of the peptide chains. Our computational model is then tested further on a set of nine PDZ domain-peptide pairs where the peptides are not originally present in the experimentally determined structures. We find a similar success rate in terms of the nativeness of minimum-energy conformations. Finally, we investigate the ability of our approach to capture variations in binding affinities for different peptide sequences. This is done in particular for a set of related sequences binding to the third PDZ domain of PSD-95 with encouraging results.  相似文献   

12.
RNA molecules play integral roles in gene regulation, and understanding their structures gives us important insights into their biological functions. Despite recent developments in template-based and parameterized energy functions, the structure of RNA--in particular the nonhelical regions--is still difficult to predict. Knowledge-based potentials have proven efficient in protein structure prediction. In this work, we describe two differentiable knowledge-based potentials derived from a curated data set of RNA structures, with all-atom or coarse-grained representation, respectively. We focus on one aspect of the prediction problem: the identification of native-like RNA conformations from a set of near-native models. Using a variety of near-native RNA models generated from three independent methods, we show that our potential is able to distinguish the native structure and identify native-like conformations, even at the coarse-grained level. The all-atom version of our knowledge-based potential performs better and appears to be more effective at discriminating near-native RNA conformations than one of the most highly regarded parameterized potential. The fully differentiable form of our potentials will additionally likely be useful for structure refinement and/or molecular dynamics simulations.  相似文献   

13.
Gao C  Stern HA 《Proteins》2007,68(1):67-75
We perform a systematic examination of the ability of several different high-resolution, atomic-detail scoring functions to discriminate native conformations of loops in membrane proteins from non-native but physically reasonable, or "decoy," conformations. Decoys constructed from changing a loop conformation while keeping the remainder of the protein fixed are a challenging test of energy function accuracy. Nevertheless, the best of the energy functions we examined recognized the native structure as lowest in energy around half the time, and consistently chose it as a low-energy structure. This suggests that the best of present energy functions, even without a representation of the lipid bilayer, are of sufficient accuracy to give reasonable confidence in predictions of membrane protein structure. We also constructed homology models for each structure, using other known structures in the same protein family as templates. Homology models were constructed using several scoring functions and modeling programs, but with a comparable sampling effort for each procedure. Our results indicate that the quality of sequence alignment is probably the most important factor in model accuracy for sequence identity from 20-40%; one can expect a reasonably accurate model for membrane proteins when sequence identity is greater than 30%, in agreement with previous studies. Most errors are localized in loop regions, which tend to be found outside the lipid bilayer. For the most discriminative energy functions, it appears that errors are most likely due to lack of sufficient sampling, although it should be stressed that present energy functions are still far from perfectly reliable.  相似文献   

14.
Protein structure refinement is an important but unsolved problem; it must be solved if we are to predict biological function that is very sensitive to structural details. Specifically, critical assessment of techniques for protein structure prediction (CASP) shows that the accuracy of predictions in the comparative modeling category is often worse than that of the template on which the homology model is based. Here we describe a refinement protocol that is able to consistently refine submitted predictions for all categories at CASP7. The protocol uses direct energy minimization of the knowledge‐based potential of mean force that is based on the interaction statistics of 167 atom types (Summa and Levitt, Proc Natl Acad Sci USA 2007; 104:3177–3182). Our protocol is thus computationally very efficient; it only takes a few minutes of CPU time to run typical protein models (300 residues). We observe an average structural improvement of 1% in GDT_TS, for predictions that have low and medium homology to known PDB structures (Global Distance Test score or GDT_TS between 50 and 80%). We also observe a marked improvement in the stereochemistry of the models. The level of improvement varies amongst the various participants at CASP, but we see large improvements (>10% increase in GDT_TS) even for models predicted by the best performing groups at CASP7. In addition, our protocol consistently improved the best predicted models in the refinement category at CASP7 and CASP8. These improvements in structure and stereochemistry prove the usefulness of our computationally inexpensive, powerful and automatic refinement protocol. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

15.
16.
Biophysical forcefields have contributed less than originally anticipated to recent progress in protein structure prediction. Here, we have investigated the selectivity of a recently developed all‐atom free‐energy forcefield for protein structure prediction and quality assessment (QA). Using a heuristic method, but excluding homology, we generated decoy‐sets for all targets of the CASP7 protein structure prediction assessment with <150 amino acids. The decoys in each set were then ranked by energy in short relaxation simulations and the best low‐energy cluster was submitted as a prediction. For four of nine template‐free targets, this approach generated high‐ranking predictions within the top 10 models submitted in CASP7 for the respective targets. For these targets, our de‐novo predictions had an average GDT_S score of 42.81, significantly above the average of all groups. The refinement protocol has difficulty for oligomeric targets and when no near‐native decoys are generated in the decoy library. For targets with high‐quality decoy sets the refinement approach was highly selective. Motivated by this observation, we rescored all server submissions up to 200 amino acids using a similar refinement protocol, but using no clustering, in a QA exercise. We found an excellent correlation between the best server models and those with the lowest energy in the forcefield. The free‐energy refinement protocol may thus be an efficient tool for relative QA and protein structure prediction. Proteins 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

17.
Small angle X-ray scattering (SAXS) measures comprehensive distance information on a protein's structure, which can constrain and guide computational structure prediction algorithms. Here, we evaluate structure predictions of 11 monomeric and oligomeric proteins for which SAXS data were collected and provided to predictors in the 13th round of the Critical Assessment of protein Structure Prediction (CASP13). The category for SAXS-assisted predictions made gains in certain areas for CASP13 compared to CASP12. Improvements included higher quality data with size exclusion chromatography-SAXS (SEC-SAXS) and better selection of targets and communication of results by CASP organizers. In several cases, we can track improvements in model accuracy with use of SAXS data. For hard multimeric targets where regular folding algorithms were unsuccessful, SAXS data helped predictors to build models better resembling the global shape of the target. For most models, however, no significant improvement in model accuracy at the domain level was registered from use of SAXS data, when rigorously comparing SAXS-assisted models to the best regular server predictions. To promote future progress in this category, we identify successes, challenges, and opportunities for improved strategies in prediction, assessment, and communication of SAXS data to predictors. An important observation is that, for many targets, SAXS data were inconsistent with crystal structures, suggesting that these proteins adopt different conformation(s) in solution. This CASP13 result, if representative of PDB structures and future CASP targets, may have substantive implications for the structure training databases used for machine learning, CASP, and use of prediction models for biology.  相似文献   

18.
We investigate the possibility that atomic burials, as measured by their distances from the structural geometrical center, contain sufficient information to determine the tertiary structure of globular proteins. We report Monte Carlo simulated annealing results of all-atom hard-sphere models in continuous space for four small proteins: the all-beta WW-domain 1E0L, the alpha/beta protein-G 1IGD, the all-alpha engrailed homeo-domain 1ENH, and the alpha + beta engineered monomeric form of the Cro protein 1ORC. We used as energy function the sum over all atoms, labeled by i, of |R(i) - R(i) (*)|, where R(i) is the atomic distance from the center of coordinates, or central distance, and R(i) (*) is the "ideal" central distance obtained from the native structure. Hydrogen bonds were taken into consideration by the assignment of two ideal distances for backbone atoms forming hydrogen bonds in the native structure depending on the formation of a geometrically defined bond, independently of bond partner. Lowest energy final conformations turned out to be very similar to the native structure for the four proteins under investigation and a strong correlation was observed between energy and distance root mean square deviation (DRMS) from the native in the case of all-beta 1E0L and alpha/beta 1IGD. For all alpha 1ENH and alpha + beta 1ORC the overall correlation between energy and DRMS among final conformations was not as high because some trajectories resulted in high DRMS but low energy final conformations in which alpha-helices adopted a non-native mutual orientation. Comparison between central distances and actual accessible surface areas corroborated the implicit assumption of correlation between these two quantities. The Z-score obtained with this native-centric potential in the discrimination of native 1ORC from a set of random compact structures confirmed that it contains a much smaller amount of native information when compared to a traditional contact Go potential but indicated that simple sequence-dependent burial potentials still need some improvement in order to attain a similar discriminability. Taken together, our results suggest that central distances, in conjunction to physically motivated hydrogen bond constraints, contain sufficient information to determine the native conformation of these small proteins and that a solution to the folding problem for globular proteins could arise from sufficiently accurate burial predictions from sequence followed by minimization of a burial-dependent energy function.  相似文献   

19.
MOTIVATION: Predicting protein interactions is one of the most challenging problems in functional genomics. Given two proteins known to interact, current docking methods evaluate billions of docked conformations by simple scoring functions, and in addition to near-native structures yield many false positives, i.e. structures with good surface complementarity but far from the native. RESULTS: We have developed a fast algorithm for filtering docked conformations with good surface complementarity, and ranking them based on their clustering properties. The free energy filters select complexes with lowest desolvation and electrostatic energies. Clustering is then used to smooth the local minima and to select the ones with the broadest energy wells-a property associated with the free energy at the binding site. The robustness of the method was tested on sets of 2000 docked conformations generated for 48 pairs of interacting proteins. In 31 of these cases, the top 10 predictions include at least one near-native complex, with an average RMSD of 5 A from the native structure. The docking and discrimination method also provides good results for a number of complexes that were used as targets in the Critical Assessment of PRedictions of Interactions experiment. AVAILABILITY: The fully automated docking and discrimination server ClusPro can be found at http://structure.bu.edu  相似文献   

20.
Lee J  Kim SY  Joo K  Kim I  Lee J 《Proteins》2004,56(4):704-714
A novel method for ab initio prediction of protein tertiary structures, PROFESY (PROFile Enumerating SYstem), is proposed. This method utilizes the secondary structure prediction information of a query sequence and the fragment assembly procedure based on global optimization. Fifteen-residue-long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. We apply PROFESY for benchmark tests to proteins with known structures to demonstrate its feasibility. In addition, we participated in CASP5 and applied PROFESY to four new-fold targets for blind prediction. The results are quite promising, despite the fact that PROFESY was in its early stages of development. In particular, PROFESY successfully provided us the best model-one structure for the target T0161.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号