共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper reports the CASP13 results of distance-based contact prediction, threading, and folding methods implemented in three RaptorX servers, which are built upon the powerful deep convolutional residual neural network (ResNet) method initiated by us for contact prediction in CASP12. On the 32 CASP13 FM (free-modeling) targets with a median multiple sequence alignment (MSA) depth of 36, RaptorX yielded the best contact prediction among 46 groups and almost the best 3D structure modeling among all server groups without time-consuming conformation sampling. In particular, RaptorX achieved top L/5, L/2, and L long-range contact precision of 70%, 58%, and 45%, respectively, and predicted correct folds (TMscore > 0.5) for 18 of 32 targets. Further, RaptorX predicted correct folds for all FM targets with >300 residues (T0950-D1, T0969-D1, and T1000-D2) and generated the best 3D models for T0950-D1 and T0969-D1 among all groups. This CASP13 test confirms our previous findings: (a) predicted distance is more useful than contacts for both template-based and free modeling; and (b) structure modeling may be improved by integrating template and coevolutionary information via deep learning. This paper will discuss progress we have made since CASP12, the strength and weakness of our methods, and why deep learning performed much better in CASP13. 相似文献
2.
A novel method has been developed for acquiring the correct alignment of a query sequence against remotely homologous proteins by extracting structural information from profiles of multiple structure alignment. A systematic search algorithm combined with a group of score functions based on sequence information and structural information has been introduced in this procedure. A limited number of top solutions (15,000) with high scores were selected as candidates for further examination. On a test-set comprising 301 proteins from 75 protein families with sequence identity less than 30%, the proportion of proteins with completely correct alignment as first candidate was improved to 39.8% by our method, whereas the typical performance of existing sequence-based alignment methods was only between 16.1% and 22.7%. Furthermore, multiple candidates for possible alignment were provided in our approach, which dramatically increased the possibility of finding correct alignment, such that completely correct alignments were found amongst the top-ranked 1000 candidates in 88.3% of the proteins. With the assistance of a sequence database, completely correct alignment solutions were achieved amongst the top 1000 candidates in 94.3% of the proteins. From such a limited number of candidates, it would become possible to identify more correct alignment using a more time-consuming but more powerful method with more detailed structural information, such as side-chain packing and energy minimization, etc. The results indicate that the novel alignment strategy could be helpful for extending the application of highly reliable methods for fold identification and homology modeling to a huge number of homologous proteins of low sequence similarity. Details of the methods, together with the results and implications for future development are presented. 相似文献
3.
Since Anfinsen demonstrated that the information encoded in a protein’s amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library. 相似文献
4.
Robert Schwarzenbacher Adam Godzik Slawomir K. Grzechnik Lukasz Jaroszewski 《Acta Crystallographica. Section D, Structural Biology》2004,60(7):1229-1236
Many crystallographic protein structures are being determined using molecular replacement (MR), a model‐based phasing method that has become increasingly important with the steady growth of the PDB. While there are several highly automated software packages for MR, the methods for preparing optimal search models for MR are relatively unexplored. Recent advances in sequence‐comparison methods allow the detection of more distantly related homologs and more accurate alignment of their sequences. It was investigated whether simple homology models (without modeling of unaligned regions) based on alignments from these improved methods are able to increase the potential of MR. 27 crystal structures were determined using a highly parallelized MR pipeline that facilitates all steps including homology detection, model preparation, MR searches, automated refinement and rebuilding. Several types of search models prepared with standard sequence–sequence alignment (BLAST) and more accurate profile–sequence and profile–profile methods (PSI‐BLAST, FFAS) were compared in MR trials. The analysis shows that models based on more accurate alignments have a higher success rate in cases where the unknown structure and the search model share less than 35% sequence identity. It is concluded that by using different types of simple models based on accurate alignments, the success rate of MR can be significantly increased. 相似文献
5.
Metal ions are crucial for protein function. They participate in enzyme catalysis, play regulatory roles, and help maintain protein structure. Current tools for predicting metal-protein interactions are based on proteins crystallized with their metal ions present (holo forms). However, a majority of resolved structures are free of metal ions (apo forms). Moreover, metal binding is a dynamic process, often involving conformational rearrangement of the binding pocket. Thus, effective predictions need to be based on the structure of the apo state. Here, we report an approach that identifies transition metal-binding sites in apo forms with a resulting selectivity >95%. Applying the approach to apo forms in the Protein Data Bank and structural genomics initiative identifies a large number of previously unknown, putative metal-binding sites, and their amino acid residues, in some cases providing a first clue to the function of the protein. 相似文献
6.
Pascale Jean Joël Pothier Patrick M. Dansette Daniel Mansuy Alain Viari 《Proteins》1997,28(3):388-404
A computational strategy for homology modeling, using several protein structures comparison, is described. This strategy implies a formalized definition of structural blocks common to several protein structures, a new program to compare these structures simultaneously, and the use of consensus matrices to improve sequence alignment between the structurally known and target proteins. Applying this method to cytochromes P450 led to the definition of 15 substructures common to P450cam, P450BM3, and P450terp, and to proposing a 3D model of P450eryF. Proteins 28:388–404, 1997 © 1997 Wiley-Liss, Inc. 相似文献
7.
Petras J. Kundrotas Ilya A. Vakser Joël Janin 《Protein science : a publication of the Protein Society》2013,22(11):1655-1663
Oligomeric proteins are more abundant in nature than monomeric proteins, and involved in all biological processes. In the absence of an experimental structure, their subunits can be modeled from their sequence like monomeric proteins, but reliable procedures to build the oligomeric assembly are scarce. Template‐based methods, which start from known protein structures, are commonly applied to model subunits. We present a method to model homodimers that relies on a structural alignment of the subunits, and test it on a set of 511 target structures recently released by the Protein Data Bank, taking as templates the earlier released structures of 3108 homodimeric proteins (H‐set), and 2691 monomeric proteins that form dimer‐like assemblies in crystals (M‐set). The structural alignment identifies a H‐set template for 97% of the targets, and in half of the cases, it yields a correct model of the dimer geometry and residue–residue contacts in the target. It also identifies a M‐set template for most of the targets, and some of the crystal dimers are very similar to the target homodimers. The procedure efficiently detects homology at low levels of sequence identities, and points to erroneous quaternary structures in the Protein Data Bank. The high coverage of the target set suggests that the content of the Protein Data Bank already approaches the structural diversity of protein assemblies in nature, and that template‐based methods should become the choice method for modeling oligomeric as well as monomeric proteins. 相似文献
8.
Predicting residue-residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance-driven template-free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template-free and template-based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue-residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template-based modeling targets. Deep learning also successfully integrated one-dimensional structural features, two-dimensional contact information, and three-dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets. 相似文献
9.
Comparative modeling is presently the most accurate method of protein structure prediction. Previous experiments have shown the selection of the correct template to be of paramount importance to the quality of the final model. We have derived a set of 732 targets for which a choice of ten or more templates exist with 30-80% sequence identity and used this set to compare a number of possible methods for template selection: BLAST, PSI-BLAST, profile-profile alignment, HHpred HMM-HMM comparison, global sequence alignment, and the use of a model quality assessment program (MQAP). In addition, we have investigated the question of whether any structurally defined subset of the sequence could be used to predict template quality better than overall sequence similarity. We find that template selection by BLAST is sufficient in 75% of cases but that there are examples in which improvement (global RMSD 0.5 A or more) could be made. No significant improvement is found for any of the more sophisticated sequence-based methods of template selection at high sequence identities. A subset of 118 targets extending to the lowest levels of sequence similarity was examined and the HHpred and MQAP methods were found to improve ranking when available templates had 35-40% maximum sequence identity. Structurally defined subsets in general are found to be less discriminative than overall sequence similarity, with the coil residue subset performing equivalently to sequence similarity. Finally, we demonstrate that if models are built and model quality is assessed in combination with the sequence-template sequence similarity that a extra 7% of \"best\" models can be found. 相似文献
10.
Timothy M. Allison Matteo T. Degiacomi Erik G. Marklund Luca Jovine Arne Elofsson Justin L. P. Benesch Michael Landreh 《Protein science : a publication of the Protein Society》2022,31(6)
The advent of machine learning‐based structure prediction algorithms such as AlphaFold2 (AF2) and RoseTTa Fold have moved the generation of accurate structural models for the entire cellular protein machinery into the reach of the scientific community. However, structure predictions of protein complexes are based on user‐provided input and may require experimental validation. Mass spectrometry (MS) is a versatile, time‐effective tool that provides information on post‐translational modifications, ligand interactions, conformational changes, and higher‐order oligomerization. Using three protein systems, we show that native MS experiments can uncover structural features of ligand interactions, homology models, and point mutations that are undetectable by AF2 alone. We conclude that machine learning can be complemented with MS to yield more accurate structural models on a small and large scale. 相似文献
11.
12.
随着蛋白质序列及结构数据的大量累积,在获得了大量描述性信息之后如何有效利用海量数据,从已有数据中高效提取信息并且应用到下游任务当中就成为了研究者亟待解决的问题。蛋白质的设计可使新蛋白的研发不再受限于实验条件,这对药物靶点预测、新药研发和材料设计等领域具有重要意义。深度学习作为一种高效的数据特征提取方法,可以通过它对蛋白质数据进行建模,进而加入先验信息对蛋白质进行设计。故此基于深度学习的蛋白质设计就成为一个具有广阔前景的研究领域。文中主要阐述基于深度学习的蛋白质序列与结构数据的建模和设计方法。详述该方法的策略、原理、适用范围、应用实例。讨论了深度学习方法在本领域的应用前景及局限性,以期为相关研究提供参考。 相似文献
13.
The use of classical molecular dynamics simulations, performed in explicit water, for the refinement of structural models of proteins generated ab initio or based on homology has been investigated. The study involved a test set of 15 proteins that were previously used by Baker and coworkers to assess the efficiency of the ROSETTA method for ab initio protein structure prediction. For each protein, four models generated using the ROSETTA procedure were simulated for periods of between 5 and 400 nsec in explicit solvent, under identical conditions. In addition, the experimentally determined structure and the experimentally derived structure in which the side chains of all residues had been deleted and then regenerated using the WHATIF program were simulated and used as controls. A significant improvement in the deviation of the model structures from the experimentally determined structures was observed in several cases. In addition, it was found that in certain cases in which the experimental structure deviated rapidly from the initial structure in the simulations, indicating internal strain, the structures were more stable after regenerating the side-chain positions. Overall, the results indicate that molecular dynamics simulations on a tens to hundreds of nanoseconds time scale are useful for the refinement of homology or ab initio models of small to medium-size proteins. 相似文献
14.
Mark E. Snow 《Proteins》1993,15(2):183-190
A novel scheme for the parameterization of a type of “potential energy” function for protein molecules is introduced. The function is parameterized based on the known conformations of previously determined protein structures and their sequence similarity to a molecule whose conformation is to be calculated. Once parameterized, minima of the potential energy function can be located using a version of simulated annealing which has been previously shown to locate global and near-global minima with the given functional form. As a test problem, the potential was parameterized based on the known structures of the rubredoxins from Desulfovibrio vulgaris, Desulfovibrio desulfuricans, and Clostridium pasteurianum, which vary from 45 to 54 amino acids in length, and the sequence alignments of these molecules with the rubredoxin sequence from Desulfovibrio gigas. Since the Desulfovibrio gigas rubredeoxin conformation has also been determined, it is possible to check the accuracy of the results. Ten simulated-annealing runs from random starting conformations were performed. Seven of the 10 resultant conformations have an all-Cα rms deviation from the crystallographically determined conformation of less than 1.7 Å. For five of the structures, the rms deviation is less than 0.8 Å. Four of the structures have conformations which are virtually identical to each other except for the position of the carboxy-terminal residue. This is also the conformation which is achieved if the determined crystal structure is minimized with the same potential. The all-Cα rms difference between the crystal and minimized crystal structures is 0.6 Å. It is further observed that the “energies” of the structures according to the potential function exhibit a strong correlation with rms deviation from the native structure. The conformations of the individual model structures and the computational aspects of the modeling procedure are discussed. © 1993 Wiley-Liss, Inc. 相似文献
15.
Zhang Z Kochhar S Grigorov MG 《Protein science : a publication of the Protein Society》2005,14(2):431-444
16.
Christoph Berbalk Christine S. Schwaiger Peter Lackner 《Protein science : a publication of the Protein Society》2009,18(10):2027-2035
Protein structure alignment methods are essential for many different challenges in protein science, such as the determination of relations between proteins in the fold space or the analysis and prediction of their biological function. A number of different pairwise and multiple structure alignment (MStA) programs have been developed and provided to the community. Prior knowledge of the expected alignment accuracy is desirable for the user of such tools. To retrieve an estimate of the performance of current structure alignment methods, we compiled a test suite taken from literature and the SISYPHUS database consisting of proteins that are difficult to align. Subsequently, different MStA programs were evaluated regarding alignment correctness and general limitations. The analysis shows that there are large differences in the success between the methods in terms of applicability and correctness. The latter ranges from 44 to 75% correct core positions. Taking only the best method result per test case this number increases to 84%. We conclude that the methods available are applicable to difficult cases, but also that there is still room for improvements in both, practicability and alignment correctness. An approach that combines the currently available methods supported by a proper score would be useful. Until then, a user should not rely on just a single program. 相似文献
17.
R.MvaI is a Type II restriction enzyme (REase), which specifically recognizes the pentanucleotide DNA sequence 5'-CCWGG-3' (W indicates A or T). It belongs to a family of enzymes, which recognize related sequences, including 5'-CCSGG-3' (S indicates G or C) in the case of R.BcnI, or 5'-CCNGG-3' (where N indicates any nucleoside) in the case of R.ScrFI. REases from this family hydrolyze the phosphodiester bond in the DNA between the 2nd and 3rd base in both strands, thereby generating a double strand break with 5'-protruding single nucleotides. So far, no crystal structures of REases with similar cleavage patterns have been solved. Characterization of sequence-structure-function relationships in this family would facilitate understanding of evolution of sequence specificity among REases and could aid in engineering of enzymes with new specificities. However, sequences of R.MvaI or its homologs show no significant similarity to any proteins with known structures, thus precluding straightforward comparative modeling. We used a fold recognition approach to identify a remote relationship between R.MvaI and the structure of DNA repair enzyme MutH, which belongs to the PD-(D/E)XK superfamily together with many other REases. We constructed a homology model of R.MvaI and used it to predict functionally important amino acid residues and the mode of interaction with the DNA. In particular, we predict that only one active site of R.MvaI interacts with the DNA target at a time, and the cleavage of both strands (5'-CCAGG-3' and 5'-CCTGG-3') is achieved by two independent catalytic events. The model is in good agreement with the available experimental data and will serve as a template for further analyses of R.MvaI, R.BcnI, R.ScrFI and other related enzymes. 相似文献
18.
Modeling a protein structure based on a homologous structure is a standard method in structural biology today. In this process an alignment of a target protein sequence onto the structure of a template(s) is used as input to a program that constructs a 3D model. It has been shown that the most important factor in this process is the correctness of the alignment and the choice of the best template structure(s), while it is generally believed that there are no major differences between the best modeling programs. Therefore, a large number of studies to benchmark the alignment qualities and the selection process have been performed. However, to our knowledge no large-scale benchmark has been performed to evaluate the programs used to transform the alignment to a 3D model. In this study, a benchmark of six different homology modeling programs- Modeller, SegMod/ENCAD, SWISS-MODEL, 3D-JIGSAW, nest, and Builder-is presented. The performance of these programs is evaluated using physiochemical correctness and structural similarity to the correct structure. From our analysis it can be concluded that no single modeling program outperform the others in all tests. However, it is quite clear that three modeling programs, Modeller, nest, and SegMod/ ENCAD, perform better than the others. Interestingly, the fastest and oldest modeling program, SegMod/ ENCAD, performs very well, although it was written more than 10 years ago and has not undergone any development since. It can also be observed that none of the homology modeling programs builds side chains as well as a specialized program (SCWRL), and therefore there should be room for improvement. 相似文献
19.
Geourjon C Combet C Blanchet C Deléage G 《Protein science : a publication of the Protein Society》2001,10(4):788-797
Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation. 相似文献
20.
This paper examines recent developments and applications of Hidden Markov Models (HMMs) to various problems in computational biology, including multiple sequence alignment, homology detection, protein sequences classification, and genomic annotation. 相似文献