首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Georg Kuenze  Jens Meiler 《Proteins》2019,87(12):1341-1350
Computational methods that produce accurate protein structure models from limited experimental data, for example, from nuclear magnetic resonance (NMR) spectroscopy, hold great potential for biomedical research. The NMR-assisted modeling challenge in CASP13 provided a blind test to explore the capabilities and limitations of current modeling techniques in leveraging NMR data which had high sparsity, ambiguity, and error rate for protein structure prediction. We describe our approach to predict the structure of these proteins leveraging the Rosetta software suite. Protein structure models were predicted de novo using a two-stage protocol. First, low-resolution models were generated with the Rosetta de novo method guided by nonambiguous nuclear Overhauser effect (NOE) contacts and residual dipolar coupling (RDC) restraints. Second, iterative model hybridization and fragment insertion with the Rosetta comparative modeling method was used to refine and regularize models guided by all ambiguous and nonambiguous NOE contacts and RDCs. Nine out of 16 of the Rosetta de novo models had the correct fold (global distance test total score > 45) and in three cases high-resolution models were achieved (root-mean-square deviation < 3.5 å). We also show that a meta-approach applying iterative Rosetta + NMR refinement on server-predicted models which employed non-NMR-contacts and structural templates leads to substantial improvement in model quality. Integrating these data-assisted refinement strategies with innovative non-data-assisted approaches which became possible in CASP13 such as high precision contact prediction will in the near future enable structure determination for large proteins that are outside of the realm of conventional NMR.  相似文献   

2.
CASP13 has investigated the impact of sparse NMR data on the accuracy of protein structure prediction. NOESY and 15N-1H residual dipolar coupling data, typical of that obtained for 15N,13C-enriched, perdeuterated proteins up to about 40 kDa, were simulated for 11 CASP13 targets ranging in size from 80 to 326 residues. For several targets, two prediction groups generated models that are more accurate than those produced using baseline methods. Real NMR data collected for a de novo designed protein were also provided to predictors, including one data set in which only backbone resonance assignments were available. Some NMR-assisted prediction groups also did very well with these data. CASP13 also assessed whether incorporation of sparse NMR data improves the accuracy of protein structure prediction relative to nonassisted regular methods. In most cases, incorporation of sparse, noisy NMR data results in models with higher accuracy. The best NMR-assisted models were also compared with the best regular predictions of any CASP13 group for the same target. For six of 13 targets, the most accurate model provided by any NMR-assisted prediction group was more accurate than the most accurate model provided by any regular prediction group; however, for the remaining seven targets, one or more regular prediction method provided a more accurate model than even the best NMR-assisted model. These results suggest a novel approach for protein structure determination, in which advanced prediction methods are first used to generate structural models, and sparse NMR data is then used to validate and/or refine these models.  相似文献   

3.
We describe the performance of MELD-accelerated molecular dynamics (MELDxMD) in determining protein structures in the NMR-data-assisted category in CASP13. Seeded from web server predictions, MELDxMD was found best in the NMR category, over 17 targets, outperforming the next-best groups by a factor of ~4 in z-score. MELDxMD gives ensembles, not single structures; succeeds on a 326-mer, near the current upper limit for NMR structures; and predicts structures that match experimental residual dipolar couplings even though the only NMR-derived data used in the simulations was NOE-based ambiguous atom–atom contacts and backbone dihedrals. MELD can use noisy and ambiguous experimental information to reduce the MD search space. We believe MELDxMD is a promising method for determining protein structures from NMR data.  相似文献   

4.
In recent years in silico protein structure prediction reached a level where fully automated servers can generate large pools of near‐native structures. However, the identification and further refinement of the best structures from the pool of models remain problematic. To address these issues, we have developed (i) a target‐specific selective refinement (SR) protocol; and (ii) molecular dynamics (MD) simulation based ranking (SMDR) method. In SR the all‐atom refinement of structures is accomplished via the Rosetta Relax protocol, subject to specific constraints determined by the size and complexity of the target. The best‐refined models are selected with SMDR by testing their relative stability against gradual heating through all‐atom MD simulations. Through extensive testing we have found that Mufold‐MD, our fully automated protein structure prediction server updated with the SR and SMDR modules consistently outperformed its previous versions. Proteins 2015; 83:1823–1835. © 2015 Wiley Periodicals, Inc.  相似文献   

5.
An increasing number of cryo‐electron microscopy (cryo‐EM) density maps are being generated with suitable resolution to trace the protein backbone and guide sidechain placement. Generating and evaluating atomic models based on such maps would be greatly facilitated by independent validation metrics for assessing the fit of the models to the data. We describe such a metric based on the fit of atomic models with independent test maps from single particle reconstructions not used in model refinement. The metric provides a means to determine the proper balance between the fit to the density and model energy and stereochemistry during refinement, and is likely to be useful in determining values of model building and refinement metaparameters quite generally.  相似文献   

6.
Hamilton N  Burrage K  Ragan MA  Huber T 《Proteins》2004,56(4):679-684
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations.  相似文献   

7.
We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods.  相似文献   

8.
We present a Model Quality Assessment Program (MQAP), called MQAPsingle, for ranking and assessing the absolute global quality of single protein models. MQAPsingle is quasi single‐model MQAP, a method that combines advantages of both “pure” single‐model MQAPs and clustering MQAPs. This approach results in higher accuracy compared to the state‐of‐the‐art single‐model MQAPs. Notably, the prediction for a given model is the same regardless if this model is submitted to our server alone or together with other models. Proteins 2016; 84:1021–1028. © 2015 Wiley Periodicals, Inc.  相似文献   

9.
Scoring model structure is an essential component of protein structure prediction that can affect the prediction accuracy tremendously. Users of protein structure prediction results also need to score models to select the best models for their application studies. In Critical Assessment of techniques for protein Structure Prediction (CASP), model accuracy estimation methods have been tested in a blind fashion by providing models submitted by the tertiary structure prediction servers for scoring. In CASP13, model accuracy estimation results were evaluated in terms of both global and local structure accuracy. Global structure accuracy estimation was evaluated by the quality of the models selected by the global structure scores and by the absolute estimates of the global scores. Residue-wise, local structure accuracy estimations were evaluated by three different measures. A new measure introduced in CASP13 evaluates the ability to predict inaccurately modeled regions that may be improved by refinement. An intensive comparative analysis on CASP13 and the previous CASPs revealed that the tertiary structure models generated by the CASP13 servers show very distinct features. Higher consensus toward models of higher global accuracy appeared even for free modeling targets, and many models of high global accuracy were not well optimized at the atomic level. This is related to the new technology in CASP13, deep learning for tertiary contact prediction. The tertiary model structures generated by deep learning pose a new challenge for EMA (estimation of model accuracy) method developers. Model accuracy estimation itself is also an area where deep learning can potentially have an impact, although current EMA methods have not fully explored that direction.  相似文献   

10.
Protein structure prediction in genomics   总被引:1,自引:0,他引:1  
As the number of completely sequenced genomes rapidly increases, including now the complete Human Genome sequence, the post-genomic problems of genome-scale protein structure determination and the issue of gene function identification become ever more pressing. In fact, these problems can be seen as interrelated in that experimentally determining or predicting or the structure of proteins encoded by genes of interest is one possible means to glean subtle hints as to the functions of these genes. The applicability of this approach to gene characterisation is reviewed, along with a brief survey of the reliability of large-scale protein structure prediction methods and the prospects for the development of new prediction methods.  相似文献   

11.
12.
Structure prediction methods often generate a large number of models for a target sequence. Even if the correct fold for the target sequence is sampled in this dataset, it is difficult to distinguish it from other decoy structures. An attempt to solve this problem using experimental mutational sensitivity data for the CcdB protein was described previously by exploiting the correlation of residue depth with mutational sensitivity (r ~ 0.6). We now show that such a correlation extends to four other proteins with localized active sites, and for which saturation mutagenesis datasets exist. We also examine whether incorporation of predicted secondary structure information and the DOPE model quality assessment score, in addition to mutational sensitivity, improves the accuracy of model discrimination using a decoy dataset of 163 targets from CASP. Although most CASP models would have been subjected to model quality assessment prior to submission, we find that the DOPE score makes a substantial contribution to the observed improvement. We therefore also applied the approach to CcdB and four other proteins for which reliable experimental mutational data exist and observe that inclusion of experimental mutational data results in a small qualitative improvement in model discrimination relative to that seen with just the DOPE score. This is largely because of our limited ability to quantitatively predict effects of point mutations on in vivo protein activity. Further improvements in the methodology are required to facilitate improved utilization of single mutant data.  相似文献   

13.
Jie Hou  Tianqi Wu  Renzhi Cao  Jianlin Cheng 《Proteins》2019,87(12):1165-1178
Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.  相似文献   

14.
Here we perform a systematic exploration of the use of distance constraints derived from small angle X-ray scattering (SAXS) measurements to filter candidate protein structures for the purpose of protein structure prediction. This is an intrinsically more complex task than that of applying distance constraints derived from NMR data where the identity of the pair of amino acid residues subject to a given distance constraint is known. SAXS, on the other hand, yields a histogram of pair distances (pair distribution function), but the identities of the pairs contributing to a given bin of the histogram are not known. Our study is based on an extension of the Levitt-Hinds coarse grained approach to ab initio protein structure prediction to generate a candidate set of C(alpha) backbones. In spite of the lack of specific residue information inherent in the SAXS data, our study shows that the implementation of a SAXS filter is capable of effectively purifying the set of native structure candidates and thus provides a substantial improvement in the reliability of protein structure prediction. We test the quality of our predicted C(alpha) backbones by doing structural homology searches against the Dali domain library, and find that the results are very encouraging. In spite of the lack of local structural details and limited modeling accuracy at the C(alpha) backbone level, we find that useful information about fold classification can be extracted from this procedure. This approach thus provides a way to use a SAXS data based structure prediction algorithm to generate potential structural homologies in cases where lack of sequence homology prevents identification of candidate folds for a given protein. Thus our approach has the potential to help in determination of the biological function of a protein based on structural homology instead of sequence homology.  相似文献   

15.
Here we describe the updated MolProbity rotamer‐library distributions derived from an order‐of‐magnitude larger and more stringently quality‐filtered dataset of about 8000 (vs. 500) protein chains, and we explain the resulting changes and improvements to model validation as seen by users. To include only side‐chains with satisfactory justification for their given conformation, we added residue‐specific filters for electron‐density value and model‐to‐density fit. The combined new protocol retains a million residues of data, while cleaning up false‐positive noise in the multi‐ datapoint distributions. It enables unambiguous characterization of conformational clusters nearly 1000‐fold less frequent than the most common ones. We describe examples of local interactions that favor these rare conformations, including the role of authentic covalent bond‐angle deviations in enabling presumably strained side‐chain conformations. Further, along with favored and outlier, an allowed category (0.3–2.0% occurrence in reference data) has been added, analogous to Ramachandran validation categories. The new rotamer distributions are used for current rotamer validation in MolProbity and PHENIX, and for rotamer choice in PHENIX model‐building and refinement. The multi‐dimensional distributions and Top8000 reference dataset are freely available on GitHub. These rotamers are termed “ultimate” because data sampling and quality are now fully adequate for this task, and also because we believe the future of conformational validation should integrate side‐chain with backbone criteria. Proteins 2016; 84:1177–1189. © 2016 Wiley Periodicals, Inc.  相似文献   

16.
We present a novel method called RosettaHoles for visual and quantitative assessment of underpacking in the protein core. RosettaHoles generates a set of spherical cavity balls that fill the empty volume between atoms in the protein interior. For visualization, the cavity balls are aggregated into contiguous overlapping clusters and small cavities are discarded, leaving an uncluttered representation of the unfilled regions of space in a structure. For quantitative analysis, the cavity ball data are used to estimate the probability of observing a given cavity in a high‐resolution crystal structure. RosettaHoles provides excellent discrimination between real and computationally generated structures, is predictive of incorrect regions in models, identifies problematic structures in the Protein Data Bank, and promises to be a useful validation tool for newly solved experimental structures.  相似文献   

17.
This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure. © 1995 Wiley-Liss, Inc.  相似文献   

18.
Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning-based protein inter-residue distance predictor to improve template-free (ab initio) tertiary structure prediction, (b) an enhanced template-based tertiary structure prediction method, and (c) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter-domain structure prediction. The results demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template-free modeling performs better than the template-based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template-free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0 .  相似文献   

19.
Accurate structural validation of proteins is of extreme importance in studies like protein structure prediction, analysis of molecular dynamic simulation trajectories and finding subtle changes in very similar structures. The benchmarks for today's structure validation are scoring methods like global distance test‐total structure (GDT‐TS), TM‐score and root mean square deviations (RMSD). However, there is a lack of methods that look at both the protein backbone and side‐chain structures at the global connectivity level and provide information about the differences in connectivity. To address this gap, a graph spectral based method (NSS—network similarity score) which has been recently developed to rigorously compare networks in diverse fields, is adopted to compare protein structures both at the backbone and at the side‐chain noncovalent connectivity levels. In this study, we validate the performance of NSS by investigating protein structures from X‐ray structures, modeling (including CASP models), and molecular dynamics simulations. Further, we systematically identify the local and the global regions of the structures contributing to the difference in NSS, through the components of the score, a feature unique to this spectral based scoring scheme. It is demonstrated that the method can quantify subtle differences in connectivity compared to a reference protein structure and can form a robust basis for protein structure comparison. Additionally, we have also introduced a network‐based method to analyze fluctuations in side chain interactions (edge‐weights) in an ensemble of structures, which can be an useful tool for the analysis of MD trajectories.  相似文献   

20.
Protein structure refinement aims to perform a set of operations given a predicted structure to improve model quality and accuracy with respect to the native in a blind fashion. Despite the numerous computational approaches to the protein refinement problem reported in the previous three CASPs, an overwhelming majority of methods degrade models rather than improve them. We initially developed a method tested using blind predictions during CASP10 which was officially ranked in 5th place among all methods in the refinement category. Here, we present Princeton_TIGRESS, which when benchmarked on all CASP 7,8,9, and 10 refinement targets, simultaneously increased GDT_TS 76% of the time with an average improvement of 0.83 GDT_TS points per structure. The method was additionally benchmarked on models produced by top performing three‐dimensional structure prediction servers during CASP10. The robustness of the Princeton_TIGRESS protocol was also tested for different random seeds. We make the Princeton_TIGRESS refinement protocol freely available as a web server at http://atlas.princeton.edu/refinement . Using this protocol, one can consistently refine a prediction to help bridge the gap between a predicted structure and the actual native structure. Proteins 2014; 82:794–814. © 2013 Wiley Periodicals, Inc.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号