期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Analysis of distance-based protein structure prediction by deep learning in CASP13

Jinbo Xu Sheng Wang 《Proteins》2019,87(12):1069-1081

This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13. 相似文献

2.

miREE: miRNA recognition elements ensemble

Paula H Reyes-Herrera Elisa Ficarra Andrea Acquaviva Enrico Macii 《BMC bioinformatics》2011,12(1):1-20

Background

The accurate prediction of ligand binding residues from amino acid sequences is important for the automated functional annotation of novel proteins. In the previous two CASP experiments, the most successful methods in the function prediction category were those which used structural superpositions of 3D models and related templates with bound ligands in order to identify putative contacting residues. However, whilst most of this prediction process can be automated, visual inspection and manual adjustments of parameters, such as the distance thresholds used for each target, have often been required to prevent over prediction. Here we describe a novel method FunFOLD, which uses an automatic approach for cluster identification and residue selection. The software provided can easily be integrated into existing fold recognition servers, requiring only a 3D model and list of templates as inputs. A simple web interface is also provided allowing access to non-expert users. The method has been benchmarked against the top servers and manual prediction groups tested at both CASP8 and CASP9.

Results

The FunFOLD method shows a significant improvement over the best available servers and is shown to be competitive with the top manual prediction groups that were tested at CASP8. The FunFOLD method is also competitive with both the top server and manual methods tested at CASP9. When tested using common subsets of targets, the predictions from FunFOLD are shown to achieve a significantly higher mean Matthews Correlation Coefficient (MCC) scores and Binding-site Distance Test (BDT) scores than all server methods that were tested at CASP8. Testing on the CASP9 set showed no statistically significant separation in performance between FunFOLD and the other top server groups tested.

Conclusions

The FunFOLD software is freely available as both a standalone package and a prediction server, providing competitive ligand binding site residue predictions for expert and non-expert users alike. The software provides a new fully automated approach for structure based function prediction using 3D models of proteins. 相似文献

3.

A further leap of improvement in tertiary structure prediction in CASP13 prompts new routes for future assessments

Luciano A. Abriata Giorgio E. Tamò Matteo Dal Peraro 《Proteins》2019,87(12):1100-1112

We present our assessment of tertiary structure predictions for hard targets in Critical Assessment of Structure Prediction round 13 (CASP13). The analysis includes (a) assignment and discussion of best models through scores-aided visual inspection of models for each evaluation unit (EU); (b) ranking of predictors resulting from this evaluation and from global scores; and (c) evaluation of progress, state of the art, and current limitations of protein structure prediction. We witness a sizable improvement in tertiary structure prediction building on the progress observed from CASP11 to CASP12, with (a) top models reaching backbone RMSD <3 å for several EUs of size <150 residues, contributed by many groups; (b) at least one model that roughly captures global topology for all EUs, probably unprecedented in this track of CASP; and (c) even quite good models for full, unsplit targets. Better structure predictions are brought about mainly by improved residue-residue contact predictions, and since this CASP also by distance predictions, achieved through state-of-the-art machine learning methods which also progressed to work with slightly shallower alignments compared to CASP12. As we reach a new realm of tertiary structure prediction quality, new directions are proposed and explored for future CASPs: (a) dropping splitting into EUs, (b) rethinking difficulty metrics probably in terms of contact and distance predictions, (c) assessing also side chains for models of high backbone accuracy, and (d) assessing residue-wise and possibly residue-residue quality estimates. 相似文献

4.

Influence of Medium and Long Range Interactions in (α/β)8 Barrel Proteins

M. Michael Gromiha S. Selvaraj 《Journal of biological physics》1997,23(4):209-217

The residue-residue contacts and the role of medium and long rangeinteractions in 36 (/)₈ barrel proteins have beenanalysed. The influence of long range contacts in the formation ofphysico-chemically similar clusters, and the preference of amino acidresidues towards long range contacts have also been studied. Theresults reveal a nearly uniform level of medium and long rangecontacts in most of the proteins. The residues Gln and Ala havehighest medium range contacts and the residue Pro has the lowestmedium range contacts. The residue Cys has the highest long rangecontact followed by other hydrophobic residues namely Val, Ile andLeu. In the physico-chemically similar clusters identified in theseproteins, 25–40 percent residues are influenced by long rangecontacts, and the residues Cys, Ile, Val and Met are the mostpreferred ones. 相似文献

5.

Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13

Jie Hou Tianqi Wu Renzhi Cao Jianlin Cheng 《Proteins》2019,87(12):1165-1178

Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets. 相似文献

6.

Contact prediction in protein modeling: Scoring,folding and refinement of coarse-grained models

Dorota Latek Andrzej Kolinski 《BMC structural biology》2008,8(1):36

Background

Several different methods for contact prediction succeeded within the Sixth Critical Assessment of Techniques for Protein Structure Prediction (CASP6). The most relevant were non-local contact predictions for targets from the most difficult categories: fold recognition-analogy and new fold. Such contacts could provide valuable structural information in case a template structure cannot be found in the PDB. 相似文献

7.

Deducing high-accuracy protein contact-maps from a triplet of coevolutionary matrices through deep residual convolutional networks

Yang Li Chengxin Zhang Eric W. Bell Wei Zheng Xiaogen Zhou Dong-Jun Yu Yang Zhang 《PLoS computational biology》2021,17(3)

The topology of protein folds can be specified by the inter-residue contact-maps and accurate contact-map prediction can help ab initio structure folding. We developed TripletRes to deduce protein contact-maps from discretized distance profiles by end-to-end training of deep residual neural-networks. Compared to previous approaches, the major advantage of TripletRes is in its ability to learn and directly fuse a triplet of coevolutionary matrices extracted from the whole-genome and metagenome databases and therefore minimize the information loss during the course of contact model training. TripletRes was tested on a large set of 245 non-homologous proteins from CASP 11&12 and CAMEO experiments and outperformed other top methods from CASP12 by at least 58.4% for the CASP 11&12 targets and 44.4% for the CAMEO targets in the top-L long-range contact precision. On the 31 FM targets from the latest CASP13 challenge, TripletRes achieved the highest precision (71.6%) for the top-L/5 long-range contact predictions. It was also shown that a simple re-training of the TripletRes model with more proteins can lead to further improvement with precisions comparable to state-of-the-art methods developed after CASP13. These results demonstrate a novel efficient approach to extend the power of deep convolutional networks for high-accuracy medium- and long-range protein contact-map predictions starting from primary sequences, which are critical for constructing 3D structure of proteins that lack homologous templates in the PDB library. 相似文献

8.

Structural basis of GC-1 selectivity for thyroid hormone receptor isoforms

Lucas Bleicher Ricardo Aparicio Fabio M Nunes Leandro Martinez Sandra M Gomes Dias Carolina Migliorini Ana Figueira Auxiliadora Morim Maria Santos Walter H Venturelli Rosangela da Silva Paulo Marcos Donate Francisco AR Neves Luiz A Simeoni John D Baxter Paul Webb Munir S Skaf Igor Polikarpov 《BMC structural biology》2008,8(1):1-13

Background

Multiple protein templates are commonly used in manual protein structure prediction. However, few automated algorithms of selecting and combining multiple templates are available.

Results

Here we develop an effective multi-template combination algorithm for protein comparative modeling. The algorithm selects templates according to the similarity significance of the alignments between template and target proteins. It combines the whole template-target alignments whose similarity significance score is close to that of the top template-target alignment within a threshold, whereas it only takes alignment fragments from a less similar template-target alignment that align with a sizable uncovered region of the target. We compare the algorithm with the traditional method of using a single top template on the 45 comparative modeling targets (i.e. easy template-based modeling targets) used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The multi-template combination algorithm improves the GDT-TS scores of predicted models by 6.8% on average. The statistical analysis shows that the improvement is significant (p-value < 10^-4). Compared with the ideal approach that always uses the best template, the multi-template approach yields only slightly better performance. During the CASP7 experiment, the preliminary implementation of the multi-template combination algorithm (FOLDpro) was ranked second among 67 servers in the category of high-accuracy structure prediction in terms of GDT-TS measure.

Conclusion

We have developed a novel multi-template algorithm to improve protein comparative modeling. 相似文献

9.

Toward an accurate prediction of inter-residue distances in proteins using 2D recursive neural networks

Predrag Kukic Claudio Mirabello Giuseppe Tradigo Ian Walsh Pierangelo Veltri Gianluca Pollastri 《BMC bioinformatics》2014,15(1):1-15

Background

Protein inter-residue contact maps provide a translation and rotation invariant topological representation of a protein. They can be used as an intermediary step in protein structure predictions. However, the prediction of contact maps represents an unbalanced problem as far fewer examples of contacts than non-contacts exist in a protein structure. In this study we explore the possibility of completely eliminating the unbalanced nature of the contact map prediction problem by predicting real-value distances between residues. Predicting full inter-residue distance maps and applying them in protein structure predictions has been relatively unexplored in the past.

Results

We initially demonstrate that the use of native-like distance maps is able to reproduce 3D structures almost identical to the targets, giving an average RMSD of 0.5Å. In addition, the corrupted physical maps with an introduced random error of ±6Å are able to reconstruct the targets within an average RMSD of 2Å. After demonstrating the reconstruction potential of distance maps, we develop two classes of predictors using two-dimensional recursive neural networks: an ab initio predictor that relies only on the protein sequence and evolutionary information, and a template-based predictor in which additional structural homology information is provided. We find that the ab initio predictor is able to reproduce distances with an RMSD of 6Å, regardless of the evolutionary content provided. Furthermore, we show that the template-based predictor exploits both sequence and structure information even in cases of dubious homology and outperforms the best template hit with a clear margin of up to 3.7Å. Lastly, we demonstrate the ability of the two predictors to reconstruct the CASP9 targets shorter than 200 residues producing the results similar to the state of the machine learning art approach implemented in the Distill server.

Conclusions

The methodology presented here, if complemented by more complex reconstruction protocols, can represent a possible path to improve machine learning algorithms for 3D protein structure prediction. Moreover, it can be used as an intermediary step in protein structure predictions either on its own or complemented by NMR restraints. 相似文献

10.

Ab initio and template-based prediction of multi-class distance maps by two-dimensional recursive neural networks

Ian Walsh Davide Baù Alberto JM Martin Catherine Mooney Alessandro Vullo Gianluca Pollastri 《BMC structural biology》2009,9(1):5-20

Background

Prediction of protein structures from their sequences is still one of the open grand challenges of computational biology. Some approaches to protein structure prediction, especially ab initio ones, rely to some extent on the prediction of residue contact maps. Residue contact map predictions have been assessed at the CASP competition for several years now. Although it has been shown that exact contact maps generally yield correct three-dimensional structures, this is true only at a relatively low resolution (3–4 Å from the native structure). Another known weakness of contact maps is that they are generally predicted ab initio, that is not exploiting information about potential homologues of known structure.

Results

We introduce a new class of distance restraints for protein structures: multi-class distance maps. We show that C_αtrace reconstructions based on 4-class native maps are significantly better than those from residue contact maps. We then build two predictors of 4-class maps based on recursive neural networks: one ab initio, or relying on the sequence and on evolutionary information; one template-based, or in which homology information to known structures is provided as a further input. We show that virtually any level of sequence similarity to structural templates (down to less than 10%) yields more accurate 4-class maps than the ab initio predictor. We show that template-based predictions by recursive neural networks are consistently better than the best template and than a number of combinations of the best available templates. We also extract binary residue contact maps at an 8 Å threshold (as per CASP assessment) from the 4-class predictors and show that the template-based version is also more accurate than the best template and consistently better than the ab initio one, down to very low levels of sequence identity to structural templates. Furthermore, we test both ab-initio and template-based 8 Å predictions on the CASP7 targets using a pre-CASP7 PDB, and find that both predictors are state-of-the-art, with the template-based one far outperforming the best CASP7 systems if templates with sequence identity to the query of 10% or better are available. Although this is not the main focus of this paper we also report on reconstructions of C_αtraces based on both ab initio and template-based 4-class map predictions, showing that the latter are generally more accurate even when homology is dubious.

Conclusion

Accurate predictions of multi-class maps may provide valuable constraints for improved ab initio and template-based prediction of protein structures, naturally incorporate multiple templates, and yield state-of-the-art binary maps. Predictions of protein structures and 8 Å contact maps based on the multi-class distance map predictors described in this paper are freely available to academic users at the url http://distill.ucd.ie/. 相似文献

11.

Improved residue contact prediction using support vector machines and a large feature set 总被引：2，自引：0，他引：2

Jianlin Cheng Pierre Baldi 《BMC bioinformatics》2007,8(1):113

Background

Predicting protein residue-residue contacts is an important 2D prediction task. It is useful for ab initio structure prediction and understanding protein folding. In spite of steady progress over the past decade, contact prediction remains still largely unsolved. 相似文献

12.

Protein Structure Prediction by Pro-Sp3-TASSER

Hongyi Zhou 《Biophysical journal》2009,96(6):2119-2127

An automated protein structure prediction algorithm, pro-sp3-Threading/ASSEmbly/Refinement (TASSER), is described and benchmarked. Structural templates are identified using five different scoring functions derived from the previously developed threading methods PROSPECTOR_3 and SP³. Top templates identified by each scoring function are combined to derive contact and distant restraints for subsequent model refinement by short TASSER simulations. For Medium/Hard targets (those with moderate to poor quality templates and/or alignments), alternative template alignments are also generated by parametric alignment and the top models selected by TASSER-QA are included in the contact and distance restraint derivation. Then, multiple short TASSER simulations are used to generate an ensemble of full-length models. Subsequently, the top models are selected from the ensemble by TASSER-QA and used to derive TASSER contacts and distant restraints for another round of full TASSER refinement. The final models are selected from both rounds of TASSER simulations by TASSER-QA. We compare pro-sp3-TASSER with our previously developed MetaTASSER method (enhanced with chunk-TASSER for Medium/Hard targets) on a representative test data set of 723 proteins <250 residues in length. For the 348 proteins classified as easy targets (those templates with good alignments and global structure similarity to the target), the cumulative TM-score of the best of top five models by pro-sp3-TASSER shows a 2.1% improvement over MetaTASSER. For the 155/220 medium/hard targets, the improvements in TM-score are 2.8% and 2.2%, respectively. All improvements are statistically significant. More importantly, the number of foldable targets (those having models whose TM-score to native >0.4 in the top five clusters) increases from 472 to 497 for all targets, and the relative increases for medium and hard targets are 10% and 15%, respectively. A server that implements the above algorithm is available at http://cssb.biology.gatech.edu/skolnick/webservice/pro-sp3-TASSER/. The source code is also available upon request. 相似文献

13.

Improving protein structure prediction using multiple sequence-based contact predictions

Wu S Szilagyi A Zhang Y 《Structure (London, England : 1993)》2011,19(8):1182-1191

Although residue-residue contact maps dictate the topology of proteins, sequence-based ab initio contact predictions have been found little use in actual structure prediction due to the low accuracy. We developed a composite set of nine SVM-based contact predictors that are used in I-TASSER simulation in combination with sparse template contact restraints. When testing the strategy on 273 nonhomologous targets, remarkable improvements of I-TASSER models were observed for both easy and hard targets, with p value by Student's t test<0.00001 and 0.001, respectively. In several cases, template modeling score increases by >30%, which essentially converts "nonfoldable" targets into "foldable" ones. In CASP9, I-TASSER employed ab initio contact predictions, and generated models for 26 FM targets with a GDT-score 16% and 44% higher than the second and third best servers from other groups, respectively. These findings demonstrate a new avenue to improve the accuracy of protein structure prediction especially for free-modeling targets. 相似文献

14.

Estimation of model accuracy in CASP13

Jianlin Cheng Myong-Ho Choe Arne Elofsson Kun-Sop Han Jie Hou Ali H. A. Maghrabi Liam J. McGuffin David Menéndez-Hurtado Kliment Olechnovič Torsten Schwede Gabriel Studer Karolis Uziela Česlovas Venclovas Björn Wallner 《Proteins》2019,87(12):1361-1377

Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue-residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus-based methods. 相似文献

15.

A flexible-protein molecular docking study of the binding of ruthenium complex compounds to PIM1, GSK-3β, and CDK2/Cyclin A protein kinases

Yingting Liu Neeraj J. Agrawal Ravi Radhakrishnan 《Journal of molecular modeling》2013,19(1):371-382

We employ ensemble docking simulations to characterize the interactions of two enantiomeric forms of a Ru-complex compound (1-R and 1-S) with three protein kinases, namely PIM1, GSK-3β, and CDK2/cyclin A. We show that our ensemble docking computational protocol adequately models the structural features of these interactions and discriminates between competing conformational clusters of ligand-bound protein structures. Using the determined X-ray crystal structure of PIM1 complexed to the compound 1-R as a control, we discuss the importance of including the protein flexibility inherent in the ensemble docking protocol, for the accuracy of the structure prediction of the bound state. A comparison of our ensemble docking results suggests that PIM1 and GSK-3β bind the two enantiomers in similar fashion, through two primary binding modes: conformation I, which is very similar to the conformation presented in the existing PIM1/compound 1-R crystal structure; conformation II, which represents a 180° flip about an axis through the NH group of the pyridocarbazole moiety, relative to conformation I. In contrast, the binding of the enantiomers to CDK2 is found to have a different structural profile including a suggested bound conformation, which lacks the conserved hydrogen bond between the kinase and the ligand (i.e., ATP, staurosporine, Ru-complex compound). The top scoring conformation of the inhibitor bound to CDK2 is not present among the top-scoring conformations of the inhibitor bound to either PIM1 or GSK-3β and vice-versa. Collectively, our results help provide atomic-level insights into inhibitor selectivity among the three kinases.