首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Zhang Y  Skolnick J 《Proteins》2004,57(4):702-710
We have developed a new scoring function, the template modeling score (TM-score), to assess the quality of protein structure templates and predicted full-length models by extending the approaches used in Global Distance Test (GDT)1 and MaxSub.2 First, a protein size-dependent scale is exploited to eliminate the inherent protein size dependence of the previous scores and appropriately account for random protein structure pairs. Second, rather than setting specific distance cutoffs and calculating only the fractions with errors below the cutoff, all residue pairs in alignment/modeling are evaluated in the proposed score. For comparison of various scoring functions, we have constructed a large-scale benchmark set of structure templates for 1489 small to medium size proteins using the threading program PROSPECTOR_3 and built the full-length models using MODELLER and TASSER. The TM-score of the initial threading alignments, compared to the GDT and MaxSub scoring functions, shows a much stronger correlation to the quality of the final full-length models. The TM-score is further exploited as an assessment of all 'new fold' targets in the recent CASP5 experiment and shows a close coincidence with the results of human-expert visual assessment. These data suggest that the TM-score is a useful complement to the fully automated assessment of protein structure predictions. The executable program of TM-score is freely downloadable at http://bioinformatics.buffalo.edu/TM-score.  相似文献   

2.
In this commentary, we describe two new protein structure prediction experiments being run in parallel with the CASP experiment, which together may be regarded as the 2000 Olympic Games of structure prediction. The first new experiment is CAFASP, the Critical Assessment of Fully Automated Structure Prediction. In CAFASP, the participants are fully automated programs or Internet servers, and here the automated results of the programs are evaluated, without any human intervention. The second new experiment, named LiveBench, follows the CAFASP ideology in that it is aimed towards the evaluation of automatic servers only, while it runs on a large set of prediction targets and in a continuous fashion. Researchers will be watching the 2000 protein structure prediction Olympic Games, to be held in December, in order to learn about the advances in the classical 'human-plus-machine' CASP category, the fully automated CAFASP category, and the comparison between the two.  相似文献   

3.
Zhou H  Skolnick J 《Proteins》2008,71(3):1211-1218
In this work, we develop a fully automated method for the quality assessment prediction of protein structural models generated by structure prediction approaches such as fold recognition servers, or ab initio methods. The approach is based on fragment comparisons and a consensus C(alpha) contact potential derived from the set of models to be assessed and was tested on CASP7 server models. The average Pearson linear correlation coefficient between predicted quality and model GDT-score per target is 0.83 for the 98 targets, which is better than those of other quality assessment methods that participated in CASP7. Our method also outperforms the other methods by about 3% as assessed by the total GDT-score of the selected top models.  相似文献   

4.
Substantial progresses in protein structure prediction have been made by utilizing deep-learning and residue-residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning-based protein inter-residue distance predictor to improve template-free (ab initio) tertiary structure prediction, (b) an enhanced template-based tertiary structure prediction method, and (c) distance-based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter-domain structure prediction. The results demonstrate that the template-free modeling based on deep learning and residue-residue distance prediction can predict the correct topology for almost all template-based modeling targets and a majority of hard targets (template-free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template-free modeling performs better than the template-based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template-free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0 .  相似文献   

5.

Background

The accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9.

Results

The FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested.

Conclusions

The FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins.  相似文献   

6.
We present the results of the evaluation of the latest LiveBench-8 experiment. These results provide a snapshot view of the state of the art in automated protein structure prediction, just before the 2004 CAFASP-4/CASP-6 experiments begin. The last CAFASP/CASP experiments demonstrated that automated meta-predictors entail a significant advance in the field, already challenging most human expert predictors. LiveBench-8 corroborates the superior performance of meta-predictors, which are able to produce useful predictions for over one-half of the test targets. More importantly, LiveBench-8 identifies a handful of recently developed autonomous (nonmeta) servers that perform at the very top, suggesting that further progress in the individual methods has recently been obtained.  相似文献   

7.
In protein tertiary structure prediction, a crucial step is to select near-native structures from a large number of predicted structural models. Over the years, extensive research has been conducted for the protein structure selection problem with most approaches focusing on developing more accurate energy or scoring functions. Despite significant advances in this area, the discerning power of current approaches is still unsatisfactory. In this paper, we propose a novel consensus-based algorithm for the selection of predicted protein structures. Given a set of predicted models, our method first removes redundant structures to derive a subset of reference models. Then, a structure is ranked based on its average pairwise similarity to the reference models. Using the CASP8 data set containing a large collection of predicted models for 122 targets, we compared our method with the best CASP8 quality assessment (QA) servers, which are all consensus based, and showed that our QA scores correlate better with the GDT-TSs than those of the CASP8 QA servers. We also compared our method with the state-of-the-art scoring functions and showed its improved performance for near-native model selection. The GDT-TSs of the top models picked by our method are on average more than 8 percent better than the ones selected by the best performing scoring function.  相似文献   

8.
Large-scale initiatives for obtaining spatial protein structures by experimental or computational means have accentuated the need for the critical assessment of protein structure determination and prediction methods. These include blind test projects such as the critical assessment of protein structure prediction (CASP) and the critical assessment of protein structure determination by nuclear magnetic resonance (CASD-NMR). An important aim is to establish structure validation criteria that can reliably assess the accuracy of a new protein structure. Various quality measures derived from the coordinates have been proposed. A universal structural quality assessment method should combine multiple individual scores in a meaningful way, which is challenging because of their different measurement units. Here, we present a method based on a generalized linear model (GLM) that combines diverse protein structure quality scores into a single quantity with intuitive meaning, namely the predicted coordinate root-mean-square deviation (RMSD) value between the present structure and the (unavailable) "true" structure (GLM-RMSD). For two sets of structural models from the CASD-NMR and CASP projects, this GLM-RMSD value was compared with the actual accuracy given by the RMSD value to the corresponding, experimentally determined reference structure from the Protein Data Bank (PDB). The correlation coefficients between actual (model vs. reference from PDB) and predicted (model vs. "true") heavy-atom RMSDs were 0.69 and 0.76, for the two datasets from CASD-NMR and CASP, respectively, which is considerably higher than those for the individual scores (-0.24 to 0.68). The GLM-RMSD can thus predict the accuracy of protein structures more reliably than individual coordinate-based quality scores.  相似文献   

9.
Multiple templates can often be used to build more accurate homology models than models built from a single template. Here we introduce PconsM, an automated protocol that uses multiple templates to build protein models. PconsM has been among the top-performing methods in the recent CASP experiments and consistently perform better than the single template models used in Pcons.net. In particular for the easier targets with many alternative templates with a high degree of sequence identity, quality is readily improved with a few percentages over the highest ranked model built on a single template. PconsM is available as an additional pipeline within the Pcons.net protein structure prediction server. AVAILABILITY AND IMPLEMENTATION: PconsM is freely available from http://pcons.net/.  相似文献   

10.
Protein structure refinement is an important but unsolved problem; it must be solved if we are to predict biological function that is very sensitive to structural details. Specifically, critical assessment of techniques for protein structure prediction (CASP) shows that the accuracy of predictions in the comparative modeling category is often worse than that of the template on which the homology model is based. Here we describe a refinement protocol that is able to consistently refine submitted predictions for all categories at CASP7. The protocol uses direct energy minimization of the knowledge‐based potential of mean force that is based on the interaction statistics of 167 atom types (Summa and Levitt, Proc Natl Acad Sci USA 2007; 104:3177–3182). Our protocol is thus computationally very efficient; it only takes a few minutes of CPU time to run typical protein models (300 residues). We observe an average structural improvement of 1% in GDT_TS, for predictions that have low and medium homology to known PDB structures (Global Distance Test score or GDT_TS between 50 and 80%). We also observe a marked improvement in the stereochemistry of the models. The level of improvement varies amongst the various participants at CASP, but we see large improvements (>10% increase in GDT_TS) even for models predicted by the best performing groups at CASP7. In addition, our protocol consistently improved the best predicted models in the refinement category at CASP7 and CASP8. These improvements in structure and stereochemistry prove the usefulness of our computationally inexpensive, powerful and automatic refinement protocol. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

11.
Knowing the quality of a protein structure model is important for its appropriate usage. We developed a model evaluation method to assess the absolute quality of a single protein model using only structural features with support vector machine regression. The method assigns an absolute quantitative score (i.e. GDT‐TS) to a model by comparing its secondary structure, relative solvent accessibility, contact map, and beta sheet structure with their counterparts predicted from its primary sequence. We trained and tested the method on the CASP6 dataset using cross‐validation. The correlation between predicted and true scores is 0.82. On the independent CASP7 dataset, the correlation averaged over 95 protein targets is 0.76; the average correlation for template‐based and ab initio targets is 0.82 and 0.50, respectively. Furthermore, the predicted absolute quality scores can be used to rank models effectively. The average difference (or loss) between the scores of the top‐ranked models and the best models is 5.70 on the CASP7 targets. This method performs favorably when compared with the other methods used on the same dataset. Moreover, the predicted absolute quality scores are comparable across models for different proteins. These features make the method a valuable tool for model quality assurance and ranking. Proteins 2009. © 2008 Wiley‐Liss, Inc.  相似文献   

12.
MOTIVATION: The prediction of protein domains is a crucial task for functional classification, homology-based structure prediction and structural genomics. In this paper, we present the SSEP-Domain protein domain prediction approach, which is based on the application of secondary structure element alignment (SSEA) and profile-profile alignment (PPA) in combination with InterPro pattern searches. SSEA allows rapid screening for potential domain regions while PPA provides us with the necessary specificity for selecting significant hits. The combination with InterPro patterns allows finding domain regions without solved structural templates if sequence family definitions exist. RESULTS: A preliminary version of SSEP-Domain was ranked among the top-performing domain prediction servers in the CASP 6 and CAFASP 4 experiments. Evaluation of the final version shows further improvement over these results together with a significant speed-up. AVAILABILITY: The server is available at http://www.bio.ifi.lmu.de/SSEP/  相似文献   

13.
During the 7th Critical Assessment of Protein Structure Prediction (CASP7) experiment, it was suggested that the real value of predicted residue–residue contacts might lie in the scoring of 3D model structures. Here, we have carried out a detailed reassessment of the contact predictions made during the recent CASP8 experiment to determine whether predicted contacts might aid in the selection of close‐to‐native structures or be a useful tool for scoring 3D structural models. We used the contacts predicted by the CASP8 residue–residue contact prediction groups to select models for each target domain submitted to the experiment. We found that the information contained in the predicted residue–residue contacts would probably have helped in the selection of 3D models in the free modeling regime and over the harder comparative modeling targets. Indeed, in many cases, the models selected using just the predicted contacts had better GDT‐TS scores than all but the best 3D prediction groups. Despite the well‐known low accuracy of residue–residue contact predictions, it is clear that the predictive power of contacts can be useful in 3D model prediction strategies. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

14.
We present our assessment of tertiary structure predictions for hard targets in Critical Assessment of Structure Prediction round 13 (CASP13). The analysis includes (a) assignment and discussion of best models through scores-aided visual inspection of models for each evaluation unit (EU); (b) ranking of predictors resulting from this evaluation and from global scores; and (c) evaluation of progress, state of the art, and current limitations of protein structure prediction. We witness a sizable improvement in tertiary structure prediction building on the progress observed from CASP11 to CASP12, with (a) top models reaching backbone RMSD <3 å for several EUs of size <150 residues, contributed by many groups; (b) at least one model that roughly captures global topology for all EUs, probably unprecedented in this track of CASP; and (c) even quite good models for full, unsplit targets. Better structure predictions are brought about mainly by improved residue-residue contact predictions, and since this CASP also by distance predictions, achieved through state-of-the-art machine learning methods which also progressed to work with slightly shallower alignments compared to CASP12. As we reach a new realm of tertiary structure prediction quality, new directions are proposed and explored for future CASPs: (a) dropping splitting into EUs, (b) rethinking difficulty metrics probably in terms of contact and distance predictions, (c) assessing also side chains for models of high backbone accuracy, and (d) assessing residue-wise and possibly residue-residue quality estimates.  相似文献   

15.

Background

Multiple protein templates are commonly used in manual protein structure prediction. However, few automated algorithms of selecting and combining multiple templates are available.

Results

Here we develop an effective multi-template combination algorithm for protein comparative modeling. The algorithm selects templates according to the similarity significance of the alignments between template and target proteins. It combines the whole template-target alignments whose similarity significance score is close to that of the top template-target alignment within a threshold, whereas it only takes alignment fragments from a less similar template-target alignment that align with a sizable uncovered region of the target. We compare the algorithm with the traditional method of using a single top template on the 45 comparative modeling targets (i.e. easy template-based modeling targets) used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The multi-template combination algorithm improves the GDT-TS scores of predicted models by 6.8% on average. The statistical analysis shows that the improvement is significant (p-value < 10-4). Compared with the ideal approach that always uses the best template, the multi-template approach yields only slightly better performance. During the CASP7 experiment, the preliminary implementation of the multi-template combination algorithm (FOLDpro) was ranked second among 67 servers in the category of high-accuracy structure prediction in terms of GDT-TS measure.

Conclusion

We have developed a novel multi-template algorithm to improve protein comparative modeling.  相似文献   

16.
During recent years many protein fold recognition methods have been developed, based on different algorithms and using various kinds of information. To examine the performance of these methods several evaluation experiments have been conducted. These include blind tests in CASP/CAFASP, large scale benchmarks, and long-term, continuous assessment with newly solved protein structures. These studies confirm the expectation that for different targets different methods produce the best predictions, and the final prediction accuracy could be improved if the available methods were combined in a perfect manner. In this article a neural-network-based consensus predictor, Pcons, is presented that attempts this task. Pcons attempts to select the best model out of those produced by six prediction servers, each using different methods. Pcons translates the confidence scores reported by each server into uniformly scaled values corresponding to the expected accuracy of each model. The translated scores as well as the similarity between models produced by different servers is used in the final selection. According to the analysis based on two unrelated sets of newly solved proteins, Pcons outperforms any single server by generating approximately 8%-10% more correct predictions. Furthermore, the specificity of Pcons is significantly higher than for any individual server. From analyzing different input data to Pcons it can be shown that the improvement is mainly attributable to measurement of the similarity between the different models. Pcons is freely accessible for the academic community through the protein structure-prediction metaserver at http://bioinfo.pl/meta/.  相似文献   

17.
ABSTRACT: BACKGROUND: Employing methods to assess the quality of modeled protein structures is now standard practice in bioinformatics. In a broad sense, the techniques can be divided into methods relying on consensus prediction on the one hand, and single-model methods on the other. Consensus methods frequently perform very well when there is a clear consensus, but this is not always the case. In particular, they frequently fail in selecting the best possible model in the hard cases (lacking consensus) or in the easy cases where models are very similar. In contrast, single-model methods do not suffer from these drawbacks and could potentially be applied on any protein of interest to assess quality or as a scoring function for sampling-based refinement. RESULTS: Here, we present a new single-model method, ProQ2, based on ideas from its predecessor, ProQ. ProQ2 is a model quality assessment algorithm that uses support vector machines to predict local as well as global quality of protein models. Improved performance is obtained by combining previously used features with updated structural and predicted features. The most important contribution can be attributed the use of profile weighting of the residue specific features and the use features averaged over the whole model even tough the prediction is still local. CONCLUSIONS: ProQ2 is significantly better than its predecessors at detecting high quality models, improving the sum of Z-scores for the selected first-ranked models by 20% and 32% compared to the second-best single-model method in CASP8 and CASP9, respectively. The absolute quality assessment of the models at both local and global level is also improved. The Pearson's correlation between the correct and local predicted score is improved from 0.59 to 0.70 on CASP8 and from 0.62 to 0.68 on CASP9; for global score to the correct GDT_TS from 0.75 to 0.80 and from 0.77 to 0.80 again compared to the second-best single methods in CASP8.  相似文献   

18.
One of the major limitations of computational protein structure prediction is the deviation of predicted models from their experimentally derived true, native structures. The limitations often hinder the possibility of applying computational protein structure prediction methods in biochemical assignment and drug design that are very sensitive to structural details. Refinement of these low‐resolution predicted models to high‐resolution structures close to the native state, however, has proven to be extremely challenging. Thus, protein structure refinement remains a largely unsolved problem. Critical assessment of techniques for protein structure prediction (CASP) specifically indicated that most predictors participating in the refinement category still did not consistently improve model quality. Here, we propose a two‐step refinement protocol, called 3Drefine, to consistently bring the initial model closer to the native structure. The first step is based on optimization of hydrogen bonding (HB) network and the second step applies atomic‐level energy minimization on the optimized model using a composite physics and knowledge‐based force fields. The approach has been evaluated on the CASP benchmark data and it exhibits consistent improvement over the initial structure in both global and local structural quality measures. 3Drefine method is also computationally inexpensive, consuming only few minutes of CPU time to refine a protein of typical length (300 residues). 3Drefine web server is freely available at http://sysbio.rnet.missouri.edu/3Drefine/ . Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

19.
Protein structure refinement refers to the process of improving the qualities of protein structures during structure modeling processes to bring them closer to their native states. Structure refinement has been drawing increasing attention in the community-wide Critical Assessment of techniques for Protein Structure prediction (CASP) experiments since its addition in 8th CASP experiment. During the 9th and recently concluded 10th CASP experiments, a consistent growth in number of refinement targets and participating groups has been witnessed. Yet, protein structure refinement still remains a largely unsolved problem with majority of participating groups in CASP refinement category failed to consistently improve the quality of structures issued for refinement. In order to alleviate this need, we developed a completely automated and computationally efficient protein 3D structure refinement method, i3Drefine, based on an iterative and highly convergent energy minimization algorithm with a powerful all-atom composite physics and knowledge-based force fields and hydrogen bonding (HB) network optimization technique. In the recent community-wide blind experiment, CASP10, i3Drefine (as ‘MULTICOM-CONSTRUCT’) was ranked as the best method in the server section as per the official assessment of CASP10 experiment. Here we provide the community with free access to i3Drefine software and systematically analyse the performance of i3Drefine in strict blind mode on the refinement targets issued in CASP10 refinement category and compare with other state-of-the-art refinement methods participating in CASP10. Our analysis demonstrates that i3Drefine is only fully-automated server participating in CASP10 exhibiting consistent improvement over the initial structures in both global and local structural quality metrics. Executable version of i3Drefine is freely available at http://protein.rnet.missouri.edu/i3drefine/.  相似文献   

20.
Jinbo Xu  Sheng Wang 《Proteins》2019,87(12):1069-1081
This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号