首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Since Anfinsen demonstrated that the information encoded in a protein’s amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.  相似文献   

2.
蛋白质序列中的关联规则发现及其应用   总被引:2,自引:0,他引:2  
随着蛋白质序列-结构分析中使用的机器学习算法越来越复杂,其结果的解释和发现过程也随之复杂化,因此有必要寻找简单且理论上可靠的方法。通过引入原理简单、理论可靠、结果具有很强实际意义的关联规则发现算法,找到了蛋白质序列中数以万计的模式。结合实例演示了如何将这些模式应用于蛋白质序列分析中,如保守区域发现、二级结构预测等。同时根据这些结果构建了一个二级结构规则库和一种简单的二级结构预测算法,实验结果表明,约81%的二级结构可以由至少一条关联规则预测得到。  相似文献   

3.
蛋白质的序列决定结构,结构决定功能。新一代准确的蛋白质结构预测工具为结构生物学、结构生物信息学、药物研发和生命科学等许多领域带来了全新的机遇与挑战,单链蛋白质结构预测的准确率达到与试验方法相媲美的水平。本综述概述了蛋白质结构预测领域的理论基础、发展历程与最新进展,讨论了大量预测的蛋白质结构和基于人工智能的方法如何影响实验结构生物学,最后,分析了当前蛋白质结构预测领域仍未解决的问题以及未来的研究方向。  相似文献   

4.
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.  相似文献   

5.
Protein structure prediction   总被引:2,自引:0,他引:2  
The prediction of protein structure, based primarily on sequence and structure homology, has become an increasingly important activity. Homology models have become more accurate and their range of applicability has increased. Progress has come, in part, from the flood of sequence and structure information that has appeared over the past few years, and also from improvements in analysis tools. These include profile methods for sequence searches, the use of three-dimensional structure information in sequence alignment and new homology modeling tools, specifically in the prediction of loop and side-chain conformations. There have also been important advances in understanding the physical chemical basis of protein stability and the corresponding use of physical chemical potential functions to identify correctly folded from incorrectly folded protein conformations.  相似文献   

6.
Central issues concerning protein structure prediction have been highlighted by the recently published summary of the fourth community-wide protein structure prediction experiment (CASP4). Although sequence/structure alignment remains the bottleneck in comparative modeling, there has been substantial progress in fully automated remote homolog detection and in de novo structure prediction. Significant further progress will probably require improvements in high-resolution modeling.  相似文献   

7.
In the present study, an attempt has been made to develop a method for predicting gamma-turns in proteins. First, we have implemented the commonly used statistical and machine-learning techniques in the field of protein structure prediction, for the prediction of gamma-turns. All the methods have been trained and tested on a set of 320 nonhomologous protein chains by a fivefold cross-validation technique. It has been observed that the performance of all methods is very poor, having a Matthew's Correlation Coefficient (MCC) 相似文献   

8.
Abstract

A set of software tools designed to study protein structure and kinetics has been developed. The core of these tools is a program called Folding Machine (FM) which is able to generate low resolution folding pathways using modest computational resources. The FM is based on a coarse-grained kinetic ab initio Monte-Carlo sampler that can optionally use information extracted from secondary structure prediction servers or from fragment libraries of local structure. The model underpinning this algorithm contains two novel elements: (a) the conformational space is discretized using the Ramachandran basins defined in the local φ-ψ energy maps; and (b) the solvent is treated implicitly by rescaling the pairwise terms of the non-bonded energy function according to the local solvent environments. The purpose of this hybrid ab initio/knowledge-based approach is threefold: to cover the long time scales of folding, to generate useful 3-dimensional models of protein structures, and to gain insight on the protein folding kinetics. Even though the algorithm is not yet fully developed, it has been used in a recent blind test of protein structure prediction (CASP5). The FM generated models within 6 Å backbone rmsd for fragments of about 60–70 residues of a-helical proteins. For a CASP5 target that turned out to be natively unfolded, the trajectory obtained for this sequence uniquely failed to converge. Also, a new measure to evaluate structure predictions is presented and used along the standard CASP assessment methods. Finally, recent improvements in the prediction of β-sheet structures are briefly described.  相似文献   

9.
Due to advances in molecular biology the DNA sequences of structural genes coding for proteins are often known before a protein is characterized or even isolated. The function of a protein whose amino acid sequence has been deduced from a DNA sequence may not even be known. This has created greater interest in the development of methods to predict the tertiary structures of proteins. The a priori prediction of a protein's structure from its amino acid sequence is not yet possible. However, since proteins with similar amino acid sequences are observed to have similar three-dimensional structures, it is possible to use an analogy with a protein of known structure to draw some conclusions about the structure and properties of an uncharacterized protein. The process of predicting the tertiary structure of a protein relies very much upon computer modeling and analysis of the structure. The prediction of the structure of the bacteriophage 434 cro repressor is used as an example illustrating current procedures.  相似文献   

10.
The field of protein structure prediction has seen significant advances in recent years. Researchers have followed a multitude of approaches, including methods based on comparative modeling, fold recognition and threading, and first-principles techniques. It is noteworthy that the structure prediction of membrane proteins is comparatively less studied by researchers in the field. A membrane protein is characterized by a protein structure that extends into or through the lipid-lipid bilayer of a cell. The structure is influenced by the combination of the hydrophobic bilayer region, the direct interaction with the bilayer, and the aqueous external environment. Due to the difficulty in obtaining reliable experimental structures, accurate computational prediction of membrane proteins is of paramount importance. An optimization model has been developed to predict the interhelical interactions in α-helical membrane proteins. A database of α-helical membrane proteins of known structure and limited sequence identity can be constructed to develop interaction probabilities. By then maximizing the occurrence of highly probable pairwise or three-residue interactions, realistic contacts can be predicted by imposing a number of geometrical constraints. The development of these low distance contacts can provide additional distance restraints for first principles-based approaches to the tertiary structure prediction problem. The proposed approach is shown to successfully predict interhelical contacts in several membrane protein systems, including bovine rhodopsin and the recently released human β2 adrenergic receptor protein structure.  相似文献   

11.
Accuracy of predicting protein secondary structure and solvent accessibility from sequence information has been improved significantly by using information contained in multiple sequence alignments as input to a neural 'network system. For the Asilomar meeting, predictions for 13 proteins were generated automatically using the publicly available prediction method PHD. The results confirm the estimate of 72% three-state prediction accuracy. The fairly accurate predictions of secondary structure segments made the tool useful as a starting point for modeling of higher dimensional aspects of protein structure. © 1995 Wiley-Liss, Inc.  相似文献   

12.
Prediction of protein–protein interactions (PPIs) commonly involves a significant computational component. Rapid recent advances in the power of computational methods for protein interaction prediction motivate a review of the state-of-the-art. We review the major approaches, organized according to the primary source of data utilized: protein sequence, protein structure, and protein co-abundance. The advent of deep learning (DL) has brought with it significant advances in interaction prediction, and we show how DL is used for each source data type. We review the literature taxonomically, present example case studies in each category, and conclude with observations about the strengths and weaknesses of machine learning methods in the context of the principal sources of data for protein interaction prediction.  相似文献   

13.
Protein structure prediction   总被引:4,自引:0,他引:4  
J Garnier 《Biochimie》1990,72(8):513-524
Current methods developed for predicting protein structure are reviewed. The most widely used algorithms of Chou and Fasman and Garnier et al for predicting secondary structure are compared to the most recent ones including sequence similarity methods, neural network, pattern recognition or joint prediction methods. The best of these methods correctly predict 63-65% of the residues in the database with cross-validation for 3 conformations, helix, beta strand and coli with a standard deviation of 6-8% per protein. However, when a homologous protein is already in the database, the accuracy of prediction by the similarity peptide method of Levin and Garnier reaches about 90%. Some conclusions can be drawn on the mechanism of protein folding. As all the prediction methods only use the local sequence for prediction (+/- 8 residues maximum) one can infer that 65% of the conformation of a residue is dictated on average by the local sequence, the rest is brought by the folding. The best predicted proteins or peptide segments are those for which the folding has less effect on the conformation. Presently, prediction of tertiary structure is only of practical use when the structure of a homologous protein is already known. Amino acid alignment to define residues of equivalent spatial position is critical for modelling of the protein. We showed for serine proteases that secondary structure prediction can help to define a better alignment. Non-homologous segments of the polypeptide chain, such as loops, libraries of known loops and/or energy minimization with various force fields, are used without yet giving satisfactory solutions. An example of modelling by homology, aided by secondary structure prediction on 2 regulatory proteins, Fnr and FixK is presented.  相似文献   

14.
The current state of the art in modeling protein structure has been assessed, based on the results of the CASP (Critical Assessment of protein Structure Prediction) experiments. In comparative modeling, improvements have been made in sequence alignment, sidechain orientation and loop building. Refinement of the models remains a serious challenge. Improved sequence profile methods have had a large impact in fold recognition. Although there has been some progress in alignment quality, this factor still limits model usefulness. In ab initio structure prediction, there has been notable progress in building approximately correct structures of 40-60 residue-long protein fragments. There is still a long way to go before the general ab initio prediction problem is solved. Overall, the field is maturing into a practical technology, able to deliver useful models for a large number of sequences.  相似文献   

15.
A comparative model building process has been utilized to predict the three-dimensional structure of the bacteriophage 434 Cro protein. Amino acid sequence similarities between the 434 Cro protein and other bacteriophage repressor and Cro proteins have been used, in conjunction with secondary structure prediction and the known structures of other base sequence specific DNA binding proteins, to derive the model. From this model the interactions between the 434 Cro protein and its operator DNA have been deduced. These proposed interactions are consistent with the known properties of the bacteriophage 434 Cro protein.  相似文献   

16.
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.  相似文献   

17.
Multiple sequence alignments are essential in computational analysis of protein sequences and structures, with applications in structure modeling, functional site prediction, phylogenetic analysis and sequence database searching. Constructing accurate multiple alignments for divergent protein sequences remains a difficult computational task, and alignment speed becomes an issue for large sequence datasets. Here, I review methodologies and recent advances in the multiple protein sequence alignment field, with emphasis on the use of additional sequence and structural information to improve alignment quality.  相似文献   

18.
蛋白质结构从头预测是不依赖模板仅从氨基酸序列信息得到天然结构。它的关键是正确定义能量函数、精确选用计算机搜索算法来寻找能量最低值。基于此,本文系统介绍了能量函数和构象搜索策略,并列举了几种比较成功的从头预测方法,通过比较得出结论:基于统计学知识的能量函数是近年来从头预测发展的主要方向,现有从头预测的构象搜索都用到Monte Carlo法。这表明随着蛋白质结构预测研究的深入,能量函数的构建、构象搜索方法的选择、大分子蛋白质结构的从头预测等关键性问题都取得了突破性进展。  相似文献   

19.
A set of software tools designed to study protein structure and kinetics has been developed. The core of these tools is a program called Folding Machine (FM) which is able to generate low resolution folding pathways using modest computational resources. The FM is based on a coarse-grained kinetic ab initio Monte-Carlo sampler that can optionally use information extracted from secondary structure prediction servers or from fragment libraries of local structure. The model underpinning this algorithm contains two novel elements: (a) the conformational space is discretized using the Ramachandran basins defined in the local phi-psi energy maps; and (b) the solvent is treated implicitly by rescaling the pairwise terms of the non-bonded energy function according to the local solvent environments. The purpose of this hybrid ab initio/knowledge-based approach is threefold: to cover the long time scales of folding, to generate useful 3-dimensional models of protein structures, and to gain insight on the protein folding kinetics. Even though the algorithm is not yet fully developed, it has been used in a recent blind test of protein structure prediction (CASP5). The FM generated models within 6 A backbone rmsd for fragments of about 60-70 residues of alpha-helical proteins. For a CASP5 target that turned out to be natively unfolded, the trajectory obtained for this sequence uniquely failed to converge. Also, a new measure to evaluate structure predictions is presented and used along the standard CASP assessment methods. Finally, recent improvements in the prediction of beta-sheet structures are briefly described.  相似文献   

20.
Due to the structural and functional importance of tight turns, some methods have been proposed to predict gamma-turns, beta-turns, and alpha-turns in proteins. In the past, studies of pi-turns were made, but not a single prediction approach has been developed so far. It will be useful to develop a method for identifying pi-turns in a protein sequence. In this paper, the support vector machine (SVM) method has been introduced to predict pi-turns from the amino acid sequence. The training and testing of this approach is performed with a newly collected data set of 640 non-homologous protein chains containing 1931 pi-turns. Different sequence encoding schemes have been explored in order to investigate their effects on the prediction performance. With multiple sequence alignment and predicted secondary structure, the final SVM model yields a Matthews correlation coefficient (MCC) of 0.556 by a 7-fold cross-validation. A web server implementing the prediction method is available at the following URL: http://210.42.106.80/piturn/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号