首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Jinbo Xu  Sheng Wang 《Proteins》2019,87(12):1069-1081
This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.  相似文献   

2.
Jie Hou  Tianqi Wu  Renzhi Cao  Jianlin Cheng 《Proteins》2019,87(12):1165-1178
Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.  相似文献   

3.
During the 7th Critical Assessment of Protein Structure Prediction (CASP7) experiment, it was suggested that the real value of predicted residue–residue contacts might lie in the scoring of 3D model structures. Here, we have carried out a detailed reassessment of the contact predictions made during the recent CASP8 experiment to determine whether predicted contacts might aid in the selection of close‐to‐native structures or be a useful tool for scoring 3D structural models. We used the contacts predicted by the CASP8 residue–residue contact prediction groups to select models for each target domain submitted to the experiment. We found that the information contained in the predicted residue–residue contacts would probably have helped in the selection of 3D models in the free modeling regime and over the harder comparative modeling targets. Indeed, in many cases, the models selected using just the predicted contacts had better GDT‐TS scores than all but the best 3D prediction groups. Despite the well‐known low accuracy of residue–residue contact predictions, it is clear that the predictive power of contacts can be useful in 3D model prediction strategies. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

4.
MOTIVATION: Despite the continuing advance in the experimental determination of protein structures, the gap between the number of known protein sequences and structures continues to increase. Prediction methods can bridge this sequence-structure gap only partially. Better predictions of non-local contacts between residues could improve comparative modeling, fold recognition and could assist in the experimental structure determination. RESULTS: Here, we introduced PROFcon, a novel contact prediction method that combines information from alignments, from predictions of secondary structure and solvent accessibility, from the region between two residues and from the average properties of the entire protein. In contrast to some other methods, PROFcon predicted short and long proteins at similar levels of accuracy. As expected, PROFcon was clearly less accurate when tested on sparse evolutionary profiles, that is, on families with few homologs. Prediction accuracy was highest for proteins belonging to the SCOP alpha/beta class. PROFcon compared favorably with state-of-the-art prediction methods at the CASP6 meeting. While the performance may still be perceived as low, our method clearly pushed the mark higher. Furthermore, predictions are already accurate enough to seed predictions of global features of protein structure.  相似文献   

5.

Background

Protein residue-residue contact prediction is important for protein model generation and model evaluation. Here we develop a conformation ensemble approach to improve residue-residue contact prediction. We collect a number of structural models stemming from a variety of methods and implementations. The various models capture slightly different conformations and contain complementary information which can be pooled together to capture recurrent, and therefore more likely, residue-residue contacts.

Results

We applied our conformation ensemble approach to free modeling targets from both CASP8 and CASP9. Given a diverse ensemble of models, the method is able to achieve accuracies of. 48 for the top L/5 medium range contacts and. 36 for the top L/5 long range contacts for CASP8 targets (L being the target domain length). When applied to targets from CASP9, the accuracies of the top L/5 medium and long range contact predictions were. 34 and. 30 respectively.

Conclusions

When operating on a moderately diverse ensemble of models, the conformation ensemble approach is an effective means to identify medium and long range residue-residue contacts. An immediate benefit of the method is that when tied with a scoring scheme, it can be used to successfully rank models.  相似文献   

6.
CASP (critical assessment of structure prediction) assesses the state of the art in modeling protein structure from amino acid sequence. The most recent experiment (CASP13 held in 2018) saw dramatic progress in structure modeling without use of structural templates (historically “ab initio” modeling). Progress was driven by the successful application of deep learning techniques to predict inter-residue distances. In turn, these results drove dramatic improvements in three-dimensional structure accuracy: With the proviso that there are an adequate number of sequences known for the protein family, the new methods essentially solve the long-standing problem of predicting the fold topology of monomeric proteins. Further, the number of sequences required in the alignment has fallen substantially. There is also substantial improvement in the accuracy of template-based models. Other areas—model refinement, accuracy estimation, and the structure of protein assemblies—have again yielded interesting results. CASP13 placed increased emphasis on the use of sparse data together with modeling and chemical crosslinking, SAXS, and NMR all yielded more mature results. This paper summarizes the key outcomes of CASP13. The special issue of PROTEINS contains papers describing the CASP13 assessments in each modeling category and contributions from the participants.  相似文献   

7.
The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library.  相似文献   

8.
Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue-residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus-based methods.  相似文献   

9.
With the advance of experimental procedures obtaining chemical crosslinking information is becoming a fast and routine practice. Information on crosslinks can greatly enhance the accuracy of protein structure modeling. Here, we review the current state of the art in modeling protein structures with the assistance of experimentally determined chemical crosslinks within the framework of the 13th meeting of Critical Assessment of Structure Prediction approaches. This largest-to-date blind assessment reveals benefits of using data assistance in difficult to model protein structure prediction cases. However, in a broader context, it also suggests that with the unprecedented advance in accuracy to predict contacts in recent years, experimental crosslinks will be useful only if their specificity and accuracy further improved and they are better integrated into computational workflows.  相似文献   

10.
We present our assessment of tertiary structure predictions for hard targets in Critical Assessment of Structure Prediction round 13 (CASP13). The analysis includes (a) assignment and discussion of best models through scores-aided visual inspection of models for each evaluation unit (EU); (b) ranking of predictors resulting from this evaluation and from global scores; and (c) evaluation of progress, state of the art, and current limitations of protein structure prediction. We witness a sizable improvement in tertiary structure prediction building on the progress observed from CASP11 to CASP12, with (a) top models reaching backbone RMSD <3 å for several EUs of size <150 residues, contributed by many groups; (b) at least one model that roughly captures global topology for all EUs, probably unprecedented in this track of CASP; and (c) even quite good models for full, unsplit targets. Better structure predictions are brought about mainly by improved residue-residue contact predictions, and since this CASP also by distance predictions, achieved through state-of-the-art machine learning methods which also progressed to work with slightly shallower alignments compared to CASP12. As we reach a new realm of tertiary structure prediction quality, new directions are proposed and explored for future CASPs: (a) dropping splitting into EUs, (b) rethinking difficulty metrics probably in terms of contact and distance predictions, (c) assessing also side chains for models of high backbone accuracy, and (d) assessing residue-wise and possibly residue-residue quality estimates.  相似文献   

11.
R. Rajgaria  Y. Wei  C. A. Floudas 《Proteins》2010,78(8):1825-1846
An integer linear optimization model is presented to predict residue contacts in β, α + β, and α/β proteins. The total energy of a protein is expressed as sum of a Cα? Cα distance dependent contact energy contribution and a hydrophobic contribution. The model selects contact that assign lowest energy to the protein structure as satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the β‐sheet alignments. These β‐sheet alignments are used as constraints for contacts between residues of β‐sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of β, α + β, α/β proteins and it was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was ~61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 Å and 15.88 Å, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO‐FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

12.
Scoring model structure is an essential component of protein structure prediction that can affect the prediction accuracy tremendously. Users of protein structure prediction results also need to score models to select the best models for their application studies. In Critical Assessment of techniques for protein Structure Prediction (CASP), model accuracy estimation methods have been tested in a blind fashion by providing models submitted by the tertiary structure prediction servers for scoring. In CASP13, model accuracy estimation results were evaluated in terms of both global and local structure accuracy. Global structure accuracy estimation was evaluated by the quality of the models selected by the global structure scores and by the absolute estimates of the global scores. Residue-wise, local structure accuracy estimations were evaluated by three different measures. A new measure introduced in CASP13 evaluates the ability to predict inaccurately modeled regions that may be improved by refinement. An intensive comparative analysis on CASP13 and the previous CASPs revealed that the tertiary structure models generated by the CASP13 servers show very distinct features. Higher consensus toward models of higher global accuracy appeared even for free modeling targets, and many models of high global accuracy were not well optimized at the atomic level. This is related to the new technology in CASP13, deep learning for tertiary contact prediction. The tertiary model structures generated by deep learning pose a new challenge for EMA (estimation of model accuracy) method developers. Model accuracy estimation itself is also an area where deep learning can potentially have an impact, although current EMA methods have not fully explored that direction.  相似文献   

13.
Georg Kuenze  Jens Meiler 《Proteins》2019,87(12):1341-1350
Computational methods that produce accurate protein structure models from limited experimental data, for example, from nuclear magnetic resonance (NMR) spectroscopy, hold great potential for biomedical research. The NMR-assisted modeling challenge in CASP13 provided a blind test to explore the capabilities and limitations of current modeling techniques in leveraging NMR data which had high sparsity, ambiguity, and error rate for protein structure prediction. We describe our approach to predict the structure of these proteins leveraging the Rosetta software suite. Protein structure models were predicted de novo using a two-stage protocol. First, low-resolution models were generated with the Rosetta de novo method guided by nonambiguous nuclear Overhauser effect (NOE) contacts and residual dipolar coupling (RDC) restraints. Second, iterative model hybridization and fragment insertion with the Rosetta comparative modeling method was used to refine and regularize models guided by all ambiguous and nonambiguous NOE contacts and RDCs. Nine out of 16 of the Rosetta de novo models had the correct fold (global distance test total score > 45) and in three cases high-resolution models were achieved (root-mean-square deviation < 3.5 å). We also show that a meta-approach applying iterative Rosetta + NMR refinement on server-predicted models which employed non-NMR-contacts and structural templates leads to substantial improvement in model quality. Integrating these data-assisted refinement strategies with innovative non-data-assisted approaches which became possible in CASP13 such as high precision contact prediction will in the near future enable structure determination for large proteins that are outside of the realm of conventional NMR.  相似文献   

14.
CASP13 has investigated the impact of sparse NMR data on the accuracy of protein structure prediction. NOESY and 15N-1H residual dipolar coupling data, typical of that obtained for 15N,13C-enriched, perdeuterated proteins up to about 40 kDa, were simulated for 11 CASP13 targets ranging in size from 80 to 326 residues. For several targets, two prediction groups generated models that are more accurate than those produced using baseline methods. Real NMR data collected for a de novo designed protein were also provided to predictors, including one data set in which only backbone resonance assignments were available. Some NMR-assisted prediction groups also did very well with these data. CASP13 also assessed whether incorporation of sparse NMR data improves the accuracy of protein structure prediction relative to nonassisted regular methods. In most cases, incorporation of sparse, noisy NMR data results in models with higher accuracy. The best NMR-assisted models were also compared with the best regular predictions of any CASP13 group for the same target. For six of 13 targets, the most accurate model provided by any NMR-assisted prediction group was more accurate than the most accurate model provided by any regular prediction group; however, for the remaining seven targets, one or more regular prediction method provided a more accurate model than even the best NMR-assisted model. These results suggest a novel approach for protein structure determination, in which advanced prediction methods are first used to generate structural models, and sparse NMR data is then used to validate and/or refine these models.  相似文献   

15.
Measuring the accuracy of protein three-dimensional structures is one of the most important problems in protein structure prediction. For structure-based drug design, the accuracy of the binding site is far more important than the accuracy of any other region of the protein. We have developed an automated method for assessing the quality of a protein model by focusing on the set of residues in the small molecule binding site. Small molecule binding sites typically involve multiple regions of the protein coming together in space, and their accuracy has been observed to be sensitive to even small alignment errors. In addition, ligand binding sites contain the critical information required for drug design, making their accuracy particularly important. We analyzed the accuracy of the binding sites on two sets of protein models: the predictions submitted by the top-performing CASP7 groups, and the models generated by four widely used homology modeling packages. The results of our CASP7 analysis significantly differ from the previous findings, implying that the binding site measure does not correlate with the traditional model quality measures used in the structure prediction benchmarks. For the modeling programs, the resolution of binding sites is extremely sensitive to the degree of sequence homology between the query and the template, even when the most accurate alignments are used in the homology modeling process.  相似文献   

16.

Background

The accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9.

Results

The FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested.

Conclusions

The FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins.  相似文献   

17.
Small angle X-ray scattering (SAXS) measures comprehensive distance information on a protein's structure, which can constrain and guide computational structure prediction algorithms. Here, we evaluate structure predictions of 11 monomeric and oligomeric proteins for which SAXS data were collected and provided to predictors in the 13th round of the Critical Assessment of protein Structure Prediction (CASP13). The category for SAXS-assisted predictions made gains in certain areas for CASP13 compared to CASP12. Improvements included higher quality data with size exclusion chromatography-SAXS (SEC-SAXS) and better selection of targets and communication of results by CASP organizers. In several cases, we can track improvements in model accuracy with use of SAXS data. For hard multimeric targets where regular folding algorithms were unsuccessful, SAXS data helped predictors to build models better resembling the global shape of the target. For most models, however, no significant improvement in model accuracy at the domain level was registered from use of SAXS data, when rigorously comparing SAXS-assisted models to the best regular server predictions. To promote future progress in this category, we identify successes, challenges, and opportunities for improved strategies in prediction, assessment, and communication of SAXS data to predictors. An important observation is that, for many targets, SAXS data were inconsistent with crystal structures, suggesting that these proteins adopt different conformation(s) in solution. This CASP13 result, if representative of PDB structures and future CASP targets, may have substantive implications for the structure training databases used for machine learning, CASP, and use of prediction models for biology.  相似文献   

18.
The accurate prediction of protein structure, both secondary and tertiary, is an ongoing problem. Over the years, many approaches have been implemented and assessed. Most prediction algorithms start with the entire amino acid sequence and treat all residues in an identical fashion independent of sequence position. Here, we analyze blind prediction data to investigate whether predictive capability varies along the chain. Free modeling results from recent critical assessment of techniques for protein structure prediction (CASP) experiments are evaluated; as is the most up‐to‐date data from EVA, a fully automated blind test of secondary structure prediction servers. The results demonstrate that structure prediction accuracy is dependent on sequence position. Both secondary structure and tertiary structure predictions are more accurate in regions near the amino(N)‐terminus when compared with analogous regions near the carboxy(C)‐terminus. Eight of 10 secondary structure prediction algorithms assessed by EVA perform significantly better in regions at the N‐terminus. CASP data shows a similar bias, with N‐terminal fragments being predicted more accurately than fragments from the C‐terminus. Two analogous fragments are taken from each model, the N‐terminal fragment begins at the start of the most N‐terminal secondary structure element (SSE), whereas the C‐terminal fragment finishes at the end of the most C‐terminal SSE. Each fragment is locally superimposed onto its respective native fragment. The relative terminal prediction accuracy (RMSD) is calculated on an intramodel basis. At a fragment length of 20 residues, the N‐terminal fragment is predicted with greater accuracy in 79% of cases. Proteins 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

19.
Protein-protein docking plays an important role in the computational prediction of the complex structure between two proteins. For years, a variety of docking algorithms have been developed, as witnessed by the critical assessment of prediction interactions (CAPRI) experiments. However, despite their successes, many docking algorithms often require a series of manual operations like modeling structures from sequences, incorporating biological information, and selecting final models. The difficulties in these manual steps have significantly limited the applications of protein-protein docking, as most of the users in the community are nonexperts in docking. Therefore, automated docking like a web server, which can give a comparable performance to human docking protocol, is pressingly needed. As such, we have participated in the blind CAPRI experiments for Rounds 38-45 and CASP13-CAPRI challenge for Round 46 with both our HDOCK automated docking web server and human docking protocol. It was shown that our HDOCK server achieved an “acceptable” or higher CAPRI-rated model in the top 10 submitted predictions for 65.5% and 59.1% of the targets in the docking experiments of CAPRI and CASP13-CAPRI, respectively, which are comparable to 66.7% and 54.5% for human docking protocol. Similar trends can also be observed in the scoring experiments. These results validated our HDOCK server as an efficient automated docking protocol for nonexpert users. Challenges and opportunities of automated docking are also discussed.  相似文献   

20.
Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning-based protein inter-residue distance predictor to improve template-free (ab initio) tertiary structure prediction, (b) an enhanced template-based tertiary structure prediction method, and (c) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter-domain structure prediction. The results demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template-free modeling performs better than the template-based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template-free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0 .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号