共查询到20条相似文献,搜索用时 0 毫秒
1.
We perform a systematic examination of the ability of several different high-resolution, atomic-detail scoring functions to discriminate native conformations of loops in membrane proteins from non-native but physically reasonable, or "decoy," conformations. Decoys constructed from changing a loop conformation while keeping the remainder of the protein fixed are a challenging test of energy function accuracy. Nevertheless, the best of the energy functions we examined recognized the native structure as lowest in energy around half the time, and consistently chose it as a low-energy structure. This suggests that the best of present energy functions, even without a representation of the lipid bilayer, are of sufficient accuracy to give reasonable confidence in predictions of membrane protein structure. We also constructed homology models for each structure, using other known structures in the same protein family as templates. Homology models were constructed using several scoring functions and modeling programs, but with a comparable sampling effort for each procedure. Our results indicate that the quality of sequence alignment is probably the most important factor in model accuracy for sequence identity from 20-40%; one can expect a reasonably accurate model for membrane proteins when sequence identity is greater than 30%, in agreement with previous studies. Most errors are localized in loop regions, which tend to be found outside the lipid bilayer. For the most discriminative energy functions, it appears that errors are most likely due to lack of sufficient sampling, although it should be stressed that present energy functions are still far from perfectly reliable. 相似文献
2.
Bonneau R Ruczinski I Tsai J Baker D 《Protein science : a publication of the Protein Society》2002,11(8):1937-1944
Although much of the motivation for experimental studies of protein folding is to obtain insights for improving protein structure prediction, there has been relatively little connection between experimental protein folding studies and computational structural prediction work in recent years. In the present study, we show that the relationship between protein folding rates and the contact order (CO) of the native structure has implications for ab initio protein structure prediction. Rosetta ab initio folding simulations produce a dearth of high CO structures and an excess of low CO structures, as expected if the computer simulations mimic to some extent the actual folding process. Consistent with this, the majority of failures in ab initio prediction in the CASP4 (critical assessment of structure prediction) experiment involved high CO structures likely to fold much more slowly than the lower CO structures for which reasonable predictions were made. This bias against high CO structures can be partially alleviated by performing large numbers of additional simulations, selecting out the higher CO structures, and eliminating the very low CO structures; this leads to a modest improvement in prediction quality. More significant improvements in predictions for proteins with complex topologies may be possible following significant increases in high-performance computing power, which will be required for thoroughly sampling high CO conformations (high CO proteins can take six orders of magnitude longer to fold than low CO proteins). Importantly for such a strategy, simulations performed for high CO structures converge much less strongly than those for low CO structures, and hence, lack of simulation convergence can indicate the need for improved sampling of high CO conformations. The parallels between Rosetta simulations and folding in vivo may extend to misfolding: The very low CO structures that accumulate in Rosetta simulations consist primarily of local up-down beta-sheets that may resemble precursors to amyloid formation. 相似文献
3.
4.
Protein residues that are critical for structure and function are expected to be conserved throughout evolution. Here, we investigate the extent to which these conserved residues are clustered in three-dimensional protein structures. In 92% of the proteins in a data set of 79 proteins, the most conserved positions in multiple sequence alignments are significantly more clustered than randomly selected sets of positions. The comparison to random subsets is not necessarily appropriate, however, because the signal could be the result of differences in the amino acid composition of sets of conserved residues compared to random subsets (hydrophobic residues tend to be close together in the protein core), or differences in sequence separation of the residues in the different sets. In order to overcome these limits, we compare the degree of clustering of the conserved positions on the native structure and on alternative conformations generated by the de novo structure prediction method Rosetta. For 65% of the 79 proteins, the conserved residues are significantly more clustered in the native structure than in the alternative conformations, indicating that the clustering of conserved residues in protein structures goes beyond that expected purely from sequence locality and composition effects. The differences in the spatial distribution of conserved residues can be utilized in de novo protein structure prediction: We find that for 79% of the proteins, selection of the Rosetta generated conformations with the greatest clustering of the conserved residues significantly enriches the fraction of close-to-native structures. 相似文献
5.
Systematic evaluation of CS‐Rosetta for membrane protein structure prediction with sparse NOE restraints 下载免费PDF全文
Katrin Reichel Olivier Fisette Tatjana Braun Oliver F. Lange Gerhard Hummer Lars V. Schäfer 《Proteins》2017,85(5):812-826
We critically test and validate the CS‐Rosetta methodology for de novo structure prediction of ‐helical membrane proteins (MPs) from NMR data, such as chemical shifts and NOE distance restraints. By systematically reducing the number and types of NOE restraints, we focus on determining the regime in which MP structures can be reliably predicted and pinpoint the boundaries of the approach. Five MPs of known structure were used as test systems, phototaxis sensory rhodopsin II (pSRII), a subdomain of pSRII, disulfide binding protein B (DsbB), microsomal prostaglandin E2 synthase‐1 (mPGES‐1), and translocator protein (TSPO). For pSRII and DsbB, where NMR and X‐ray structures are available, resolution‐adapted structural recombination (RASREC) CS‐Rosetta yields structures that are as close to the X‐ray structure as the published NMR structures if all available NMR data are used to guide structure prediction. For mPGES‐1 and Bacillus cereus TSPO, where only X‐ray crystal structures are available, highly accurate structures are obtained using simulated NMR data. One main advantage of RASREC CS‐Rosetta is its robustness with respect to even a drastic reduction of the number of NOEs. Close‐to‐native structures were obtained with one randomly picked long‐range NOEs for every 14, 31, 38, and 8 residues for full‐length pSRII, the pSRII subdomain, TSPO, and DsbB, respectively, in addition to using chemical shifts. For mPGES‐1, atomically accurate structures could be predicted even from chemical shifts alone. Our results show that atomic level accuracy for helical membrane proteins is achievable with CS‐Rosetta using very sparse NOE restraint sets to guide structure prediction. Proteins 2017; 85:812–826. © 2016 Wiley Periodicals, Inc. 相似文献
6.
Background
Since experimental techniques are time and cost consuming, in silico protein structure prediction is essential to produce conformations of protein targets. When homologous structures are not available, fragment-based protein structure prediction has become the approach of choice. However, it still has many issues including poor performance when targets’ lengths are above 100 residues, excessive running times and sub-optimal energy functions. Taking advantage of the reliable performance of structural class prediction software, we propose to address some of the limitations of fragment-based methods by integrating structural constraints in their fragment selection process.Results
Using Rosetta, a state-of-the-art fragment-based protein structure prediction package, we evaluated our proposed pipeline on 70 former CASP targets containing up to 150 amino acids. Using either CATH or SCOP-based structural class annotations, enhancement of structure prediction performance is highly significant in terms of both GDT_TS (at least +2.6, p-values < 0.0005) and RMSD (−0.4, p-values < 0.005). Although CATH and SCOP classifications are different, they perform similarly. Moreover, proteins from all structural classes benefit from the proposed methodology. Further analysis also shows that methods relying on class-based fragments produce conformations which are more relevant to user and converge quicker towards the best model as estimated by GDT_TS (up to 10% in average). This substantiates our hypothesis that usage of structurally relevant templates conducts to not only reducing the size of the conformation space to be explored, but also focusing on a more relevant area.Conclusions
Since our methodology produces models the quality of which is up to 7% higher in average than those generated by a standard fragment-based predictor, we believe it should be considered before conducting any fragment-based protein structure prediction. Despite such progress, ab initio prediction remains a challenging task, especially for proteins of average and large sizes. Apart from improving search strategies and energy functions, integration of additional constraints seems a promising route, especially if they can be accurately predicted from sequence alone.Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0576-2) contains supplementary material, which is available to authorized users. 相似文献7.
We present a knowledge‐based function to score protein decoys based on their similarity to native structure. A set of features is constructed to describe the structure and sequence of the entire protein chain. Furthermore, a qualitative relationship is established between the calculated features and the underlying electromagnetic interaction that dominates this scale. The features we use are associated with residue–residue distances, residue–solvent distances, pairwise knowledge‐based potentials and a four‐body potential. In addition, we introduce a new target to be predicted, the fitness score, which measures the similarity of a model to the native structure. This new approach enables us to obtain information both from decoys and from native structures. It is also devoid of previous problems associated with knowledge‐based potentials. These features were obtained for a large set of native and decoy structures and a back‐propagating neural network was trained to predict the fitness score. Overall this new scoring potential proved to be superior to the knowledge‐based scoring functions used as its inputs. In particular, in the latest CASP (CASP10) experiment our method was ranked third for all targets, and second for freely modeled hard targets among about 200 groups for top model prediction. Ours was the only method ranked in the top three for all targets and for hard targets. This shows that initial results from the novel approach are able to capture details that were missed by a broad spectrum of protein structure prediction approaches. Source codes and executable from this work are freely available at http://mathmed.org /#Software and http://mamiris.com/ . Proteins 2014; 82:752–759. © 2013 Wiley Periodicals, Inc. 相似文献
8.
We recently developed the Rosetta algorithm for ab initio protein structure prediction, which generates protein structures from fragment libraries using simulated annealing. The scoring function in this algorithm favors the assembly of strands into sheets. However, it does not discriminate between different sheet motifs. After generating many structures using Rosetta, we found that the folding algorithm predominantly generates very local structures. We surveyed the distribution of beta-sheet motifs with two edge strands (open sheets) in a large set of non-homologous proteins. We investigated how much of that distribution can be accounted for by rules previously published in the literature, and developed a filter and a scoring method that enables us to improve protein structure prediction for beta-sheet proteins. Proteins 2002;48:85-97. 相似文献
9.
Predicting protein tertiary structures by in silico folding is still very difficult for proteins that have new folds. Here, we developed a coarse-grained energy function, SimFold, for de novo structure prediction, performed a benchmark test of prediction with fragment assembly simulations for 38 test proteins, and proposed consensus prediction with Rosetta. The SimFold energy consists of many terms that take into account solvent-induced effects on the basis of physicochemical consideration. In the benchmark test, SimFold succeeded in predicting native structures within 6.5 A for 12 of 38 proteins; this success rate was the same as that by the publicly available version of Rosetta (ab initio version 1.2) run with default parameters. We investigated which energy terms in SimFold contribute to structure prediction performance, finding that the hydrophobic interaction is the most crucial for the prediction, whereas other sequence-specific terms have weak but positive roles. In the benchmark, well-predicted proteins by SimFold and by Rosetta were not the same for 5 of 12 proteins, which led us to introduce consensus prediction. With combined decoys, we succeeded in prediction for 16 proteins, four more than SimFold or Rosetta separately. For each of 38 proteins, structural ensembles generated by SimFold and by Rosetta were qualitatively compared by mapping sampled structural space onto two dimensions. For proteins of which one of the two methods succeeded and the other failed in prediction, the former had a less scattered ensemble located around the native. For proteins of which both methods succeeded in prediction, often two ensembles were mixed up. 相似文献
10.
This work presents a novel C(alpha)--C(alpha) distance dependent force field which is successful in selecting native structures from an ensemble of high resolution near-native conformers. An enhanced and diverse protein set, along with an improved decoy generation technique, contributes to the effectiveness of this potential. High quality decoys were generated for 1489 nonhomologous proteins and used to train an optimization based linear programming formulation. The goal in developing a set of high resolution decoys was to develop a simple, distance-dependent force field that yields the native structure as the lowest energy structure and assigns higher energies to decoy structures that are quite similar as well as those that are less similar. The model also includes a set of physical constraints that were based on experimentally observed physical behavior of the amino acids. The force field was tested on two sets of test decoys not in the training set and was found to excel on all the metrics that are widely used to measure the effectiveness of a force field. The high resolution force field was successful in correctly identifying 113 native structures out of 150 test cases and the average rank obtained for this test was 1.87. All the high resolution structures (training and testing) used for this work are available online and can be downloaded from http://titan.princeton.edu/HRDecoys. 相似文献
11.
A new force field for pairwise residue interactions as a function of C(alpha) to C(alpha) distances is presented. The force field was developed through the solution of a linear programming formulation with large sets of constraints. The constraints are based on the construction of >80,000 low-energy decoys for a set of proteins and requiring the decoy energies for each protein system to be higher than the native conformation of that particular protein. The generation of a robust force field was facilitated by the use of a novel decoy generation process, which involved the rational selection of proteins to add to the training set and included a significant energy minimization of the decoys. The force field was tested on a large set of decoys for various proteins not included in the training set and shown to perform well compared with a leading force field in identifying the native conformation for these proteins. 相似文献
12.
We describe the derivation and testing of a knowledge-based atomic environment potential for the modeling of protein structural energetics. An analysis of the probabilities of atomic interactions in a dataset of high-resolution protein structures shows that the probabilities of non-bonded inter-atomic contacts are not statistically independent events, and that the multi-body contact frequencies are poorly predicted from pairwise contact potentials. A pseudo-energy function is defined that measures the preferences for protein atoms to be in a given microenvironment defined by the number of contacting atoms in the environment and its atomic composition. This functional form is tested for its ability to recognize native protein structures amongst an ensemble of decoy structures and a detailed relative performance comparison is made with a number of common functions used in protein structure prediction. 相似文献
13.
14.
The principal bottleneck in protein structure prediction is the refinement of models from lower accuracies to the resolution observed by experiment. We developed a novel constraints‐based refinement method that identifies a high number of accurate input constraints from initial models and rebuilds them using restrained torsion angle dynamics (rTAD). We previously created a Bayesian statistics‐based residue‐specific all‐atom probability discriminatory function (RAPDF) to discriminate native‐like models by measuring the probability of accuracy for atom type distances within a given model. Here, we exploit RAPDF to score (i.e., filter) constraints from initial predictions that may or may not be close to a native‐like state, obtain consensus of top scoring constraints amongst five initial models, and compile sets with no redundant residue pair constraints. We find that this method consistently produces a large and highly accurate set of distance constraints from which to build refinement models. We further optimize the balance between accuracy and coverage of constraints by producing multiple structure sets using different constraint distance cutoffs, and note that the cutoff governs spatially near versus distant effects in model generation. This complete procedure of deriving distance constraints for rTAD simulations improves the quality of initial predictions significantly in all cases evaluated by us. Our procedure represents a significant step in solving the protein structure prediction and refinement problem, by enabling the use of consensus constraints, RAPDF, and rTAD for protein structure modeling and refinement. Proteins 2009. © 2009 Wiley‐Liss, Inc. 相似文献
15.
Jingfen Zhang Qingguo Wang Bogdan Barz Zhiquan He Ioan Kosztin Yi Shang Dong Xu 《Proteins》2010,78(5):1137-1152
There have been steady improvements in protein structure prediction during the past 2 decades. However, current methods are still far from consistently predicting structural models accurately with computing power accessible to common users. Toward achieving more accurate and efficient structure prediction, we developed a number of novel methods and integrated them into a software package, MUFOLD. First, a systematic protocol was developed to identify useful templates and fragments from Protein Data Bank for a given target protein. Then, an efficient process was applied for iterative coarse‐grain model generation and evaluation at the Cα or backbone level. In this process, we construct models using interresidue spatial restraints derived from alignments by multidimensional scaling, evaluate and select models through clustering and static scoring functions, and iteratively improve the selected models by integrating spatial restraints and previous models. Finally, the full‐atom models were evaluated using molecular dynamics simulations based on structural changes under simulated heating. We have continuously improved the performance of MUFOLD by using a benchmark of 200 proteins from the Astral database, where no template with >25% sequence identity to any target protein is included. The average root‐mean‐square deviation of the best models from the native structures is 4.28 Å, which shows significant and systematic improvement over our previous methods. The computing time of MUFOLD is much shorter than many other tools, such as Rosetta. MUFOLD demonstrated some success in the 2008 community‐wide experiment for protein structure prediction CASP8. Proteins 2010. © 2009 Wiley‐Liss, Inc. 相似文献
16.
A novel method for protein secondary structure prediction using dual-layer SVM and profiles 总被引:2,自引:0,他引:2
A high-performance method was developed for protein secondary structure prediction based on the dual-layer support vector machine (SVM) and position-specific scoring matrices (PSSMs). SVM is a new machine learning technology that has been successfully applied in solving problems in the field of bioinformatics. The SVM's performance is usually better than that of traditional machine learning approaches. The performance was further improved by combining PSSM profiles with the SVM analysis. The PSSMs were generated from PSI-BLAST profiles, which contain important evolution information. The final prediction results were generated from the second SVM layer output. On the CB513 data set, the three-state overall per-residue accuracy, Q3, reached 75.2%, while segment overlap (SOV) accuracy increased to 80.0%. On the CB396 data set, the Q3 of our method reached 74.0% and the SOV reached 78.1%. A web server utilizing the method has been constructed and is available at http://www.bioinfo.tsinghua.edu.cn/pmsvm. 相似文献
17.
18.
Balancing exploration and exploitation in population‐based sampling improves fragment‐based de novo protein structure prediction 下载免费PDF全文
Conformational search space exploration remains a major bottleneck for protein structure prediction methods. Population‐based meta‐heuristics typically enable the possibility to control the search dynamics and to tune the balance between local energy minimization and search space exploration. EdaFold is a fragment‐based approach that can guide search by periodically updating the probability distribution over the fragment libraries used during model assembly. We implement the EdaFold algorithm as a Rosetta protocol and provide two different probability update policies: a cluster‐based variation (EdaRosec) and an energy‐based one (EdaRoseen). We analyze the search dynamics of our new Rosetta protocols and show that EdaRosec is able to provide predictions with lower C RMSD to the native structure than EdaRoseen and Rosetta AbInitio Relax protocol. Our software is freely available as a C++ patch for the Rosetta suite and can be downloaded from http://www.riken.jp/zhangiru/software/ . Our protocols can easily be extended in order to create alternative probability update policies and generate new search dynamics. Proteins 2017; 85:852–858. © 2016 Wiley Periodicals, Inc. 相似文献
19.
Computational methods in protein structure prediction 总被引:1,自引:0,他引:1
Floudas CA 《Biotechnology and bioengineering》2007,97(2):207-213
This review presents the advances in protein structure prediction from the computational methods perspective. The approaches are classified into four major categories: comparative modeling, fold recognition, first principles methods that employ database information, and first principles methods without database information. Important advances along with current limitations and challenges are presented. 相似文献
20.
The results of a protein structure prediction contest are reviewed. Twelve different groups entered predictions on 14 proteins of known sequence whose structures had been determined but not yet disseminated to the scientific community. Thus, these represent true tests of the current state of structure prediction methodologies. From this work, it is clear that accurate tertiary structure prediction is not yet possible. However, protein fold and motif prediction are possible when the motif is recognizably similar to another known structure. Internal symmetry and the information inherent in an aligned family of homologous sequences facilitate predictive efforts. Novel folds remain a major challenge for prediction efforts. © 1995 Wiley-Liss, Inc. 相似文献