首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 343 毫秒
1.
把20种氨基酸简化为3类:疏水氨基酸(hydrophobic,H)、亲水氨基酸(hydrophilic,P)及中性氨基酸(neutral,N),每个氨基酸简化为一个点,用其C!原子来代替.采用非格点模型,以相对熵作为优化函数,进行蛋白质三维结构预测.为了与基于相对熵方法的蛋白质设计工作进行统一,采用了新的接触强度函数.选用蛋白质数据库中的天然蛋白质作为测试靶蛋白,结果表明,采用该模型和方法取得了较好的结果,预测结构相对于天然结构的均方根偏差在0.30~0.70nm之间.该工作为基于相对熵及HNP模型的蛋白质设计研究打下了基础.  相似文献   

2.
给出了以疏水一亲水模型为基础的蛋白质设计方法,该方法以物理学原理为基础,以相对熵作为优化的目标函数。对四种不同结构类型的天然结构的真实蛋白质进行了检测,分析了影响检测成功率的主要因素,结果表明,该方法是普适的,可用于对不同结构类型的蛋白质设计序列。  相似文献   

3.
李菁  王炜 《中国科学C辑》2006,36(6):552-562
序列比对是寻找蛋白质结构保守性区域的常用方法, 然而当序列相似小于30%时比对准确度却不高, 这是因为在这些序列中具有相似结构功能的不同残基在序列比对中往往被错误配对. 基于相似的物理化学性质, 某些残基可以被归类为一组, 而应用这些简化后的残基字符可以有效地简化蛋白质序列的复杂性并保持序列的主要信息. 因此, 如果20种天然氨基酸残基能够正确的归类, 可以有效地提高序列比对的准确度. 本文基于蛋白质结构比对数据库DAPS, 提出了一种新的氨基酸残基归类方法, 并可以同时得到不同简化程度下的替代矩阵用于序列比对. 归类的合理性由相互熵方法确认, 并且应用简化后的字符表于序列比对来识别蛋白质的结构保守区域. 结果表明, 当氨基酸残基字符简化到9个左右时能够有效地提高序列比对的准确度.  相似文献   

4.
蛋白质共进化分析研究进展   总被引:1,自引:0,他引:1  
一些对蛋白质活性很重要的残基在进化过程中是高度保守的,另有一些残基通过共进化来维持蛋白质结构和功能上的稳定。由于共进化残基分析可在未知蛋白质结构时,仅依据序列推断残基间的相互作用,因此在蛋白质结构和功能预测上具有重要的研究意义。当前分析共进化残基的方法主要有基于相关系数的方法、基于微扰理论的方法、参数检验法等。然而,由于存在蛋白质系统进化的背景干扰,目前共进化残基分析的精度仍有待进一步提高。本文概述了蛋白质共进化分析的方法及其研究进展,并对其发展趋势进行了预测。  相似文献   

5.
蛋白质结构预测研究进展   总被引:1,自引:0,他引:1  
蛋白质结构预测是生物信息学当前的主要挑战之一.按照蛋白质结构预测对PDB数据 库信息的依赖程度,可以将其划分成两类:模板依赖模型和从头预测方法.其中模板依赖模 型又可以分为同源模型与穿线法.本文介绍了各种预测方法主要步骤,分析了制约各种方法 的瓶颈,及其研究进展.同源模型所取得的结构精度较高,但其对模板依赖性强;用于低同 源性的穿线法是模板依赖的模型重要的研究方向;从头预测法中统计学函数与物理函数的综 合使用取得了很好的效果,但是对于超过150个残基的片段,依然是巨大的挑战.  相似文献   

6.
蛋白质残基替换是基因突变的产物之一,它可能改变蛋白质三维结构,对其生物学功能产生重大影响,因此研究蛋白质残基替换与结构改变的关系具有重要意义.随着实验解析蛋白质结构的数量迅猛增长,越来越多的野生型-突变体被应用于结构生物学的比较研究中.本研究从蛋白质三维结构数据库(PDB)出发,收集和计算了大量结构特征数据,构建了一个目前已知最大的野生型-突变体(单残基差异)的结构对数据库DRSP,展示出氨基酸类型和主链偏好性对结构保守性的相关性.DRSP的开放使用可为高精度的蛋白质结构分析预测提供有用信息,它的数据库网址是http://www.labshare.cn/drsp/index.php.  相似文献   

7.
蛋白质残基替换是基因突变的产物之一,它可能改变蛋白质三维结构,对其生物学功能产生重大影响,因此研究蛋白质残基替换与结构改变的关系具有重要意义.随着实验解析蛋白质结构的数量迅猛增长,越来越多的野生型-突变体被应用于结构生物学的比较研究中.本研究从蛋白质三维结构数据库(PDB)出发,收集和计算了大量结构特征数据,构建了一个目前已知最大的野生型-突变体(单残基差异)的结构对数据库DRSP,展示出氨基酸类型和主链偏好性对结构保守性的相关性.DRSP的开放使用可为高精度的蛋白质结构分析预测提供有用信息,它的数据库网址是http://www.labshare.cn/drsp/index.php.  相似文献   

8.
提出了一种新的蛋白质二级结构预测方法. 该方法从氨基酸序列中提取出和自然语言中的“词”类似的与物种相关的蛋白质二级结构词条, 这些词条形成了蛋白质二级结构词典, 该词典描述了氨基酸序列和蛋白质二级结构之间的关系. 预测蛋白质二级结构的过程和自然语言中的分词和词性标注一体化的过程类似. 该方法把词条序列看成是马尔科夫链, 通过Viterbi算法搜索每个词条被标注为某种二级结构类型的最大概率, 其中使用词网格描述分词的结果, 使用最大熵马尔科夫模型计算词条的二级结构概率. 蛋白质二级结构预测的结果是最优的分词所对应的二级结构类型. 在4个物种的蛋白质序列上对这种方法进行测试, 并和PHD方法进行比较. 试验结果显示, 这种方法的Q3准确率比PHD方法高3.9%, SOV准确率比PHD方法高4.6%. 结合BLAST搜索的局部相似的序列可以进一步提高预测的准确率. 在50个CASP5目标蛋白质序列上进行测试的结果是: Q3准确率为78.9%, SOV准确率为77.1%. 基于这种方法建立了一个蛋白质二级结构预测的服务器, 可以通过http://www.insun.hit.edu.cn:81/demos/biology/index.html来访问.  相似文献   

9.
本文使用人工神经元网络预测蛋白质分子主链的二面角,通过分别学习所有残基,成规则二级结构的残基,无规线团的二面角,结果表明由蛋白质分子的一级序列出发预测主链二面角,对于α螺旋成功率最高,β折叠次之,而无规线团的二面角和其一级序列相关很小,故难以预测.  相似文献   

10.
本文使用人工神经元网络预测蛋白质分子主链的二面角,通过分别学习所有残基,成规则二级结构的残基,无规线团的二面角,结果表明由蛋白质分子的一级序列出发预测主链二面角,对于α螺旋成功率最高,β折叠次之,而无规线团的二面角和其一级序列相关很小,故难以预测.  相似文献   

11.
The contact order is believed to be an important factor for understanding protein folding mechanisms. In our earlier work, we have shown that the long-range interactions play a vital role in protein folding. In this work, we analyzed the contribution of long-range contacts to determine the folding rate of two-state proteins. We found that the residues that are close in space and are separated by at least ten to 15 residues in sequence are important determinants of folding rates, suggesting the presence of a folding nucleus at an interval of approximately 25 residues. A novel parameter "long-range order" has been proposed to predict protein folding rates. This parameter shows as good a relationship with the folding rate of two-state proteins as contact order. Further, we examined the minimum limit of residue separation to determine the long-range contacts for different structural classes. We observed an excellent correlation between long-range order and folding rate for all classes of globular proteins. We suggest that in mixed-class proteins, a larger number of residues can serve as folding nuclei compared to all-alpha and all-beta proteins. A simple statistical method has been developed to predict the folding rates of two-state proteins using the long-range order that produces an agreement with experimental results that is better or comparable to other methods in the literature.  相似文献   

12.
Protein structures are stabilized by both local and long range interactions. In this work, we analyze the residue-residue contacts and the role of medium- and long-range interactions in globular proteins belonging to different structural classes. The results show that while medium range interactions predominate in all-alpha class proteins, long-range interactions predominate in all-beta class. Based on this, we analyze the performance of several structure prediction methods in different structural classes of globular proteins and found that all the methods predict the secondary structures of all-alpha proteins more accurately than other classes. Also, we observed that the residues occurring in the range of 21-30 residues apart contributes more towards long-range contacts and about 85% of residues are involved in long-range contacts. Further, the preference of residue pairs to the folding and stability of globular proteins is discussed.  相似文献   

13.
One major problem with the existing algorithm for the prediction of protein structural classes is low accuracies for proteins from α/β and α+β classes. In this study, three novel features were rationally designed to model the differences between proteins from these two classes. In combination with other rational designed features, an 11-dimensional vector prediction method was proposed. By means of this method, the overall prediction accuracy based on 25PDB dataset was 1.5% higher than the previous best-performing method, MODAS. Furthermore, the prediction accuracy for proteins from α+β class based on 25PDB dataset was 5% higher than the previous best-performing method, SCPRED. The prediction accuracies obtained with the D675 and FC699 datasets were also improved.  相似文献   

14.
Intrinsic disorder in the Protein Data Bank   总被引:2,自引:0,他引:2  
The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only approximately 7% of proteins are observed in the corresponding PDB structures, and only approximately 25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, "Observed" (which correspond to structured regions), "Not observed" (regions with missing electron density, potentially disordered), "Uncharacterized," and "Ambiguous," depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a 'fragment' or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. "Non-observed," "Ambiguous," and "Uncharacterized" regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR(R) VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the "Observed" dataset are ordered, and that the "Not observed" regions are mostly disordered. The "Uncharacterized" regions possess some tendency toward order, whereas the predictions for the short "Ambiguous" regions are really ambiguous. Long "Ambiguous" regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be "wobbly" domains. Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset approximately 10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and approximately 40% of the proteins possess short regions (> or =10 and < 30 amino-acid long) of missing and ambiguous residues.  相似文献   

15.
Abstract

The Protein Data Bank (PDB) is the preeminent source of protein structural information. PDB contains over 32,500 experimentally determined 3-D structures solved using X-ray crystallography or nuclear magnetic resonance spectroscopy. Intrinsically disordered regions fail to form a fixed 3-D structure under physiological conditions. In this study, we compare the amino-acid sequences of proteins whose structures are determined by X-ray crystallography with the corresponding sequences from the Swiss-Prot database. The analyzed dataset includes 16,370 structures, which represent 18,101 PDB chains and 5,434 different proteins from 910 different organisms (2,793 eukaryotic, 2,109 bacterial, 288 viral, and 244 archaeal). In this dataset, on average, each Swiss-Prot protein is represented by 7 PDB chains with 76% of the crystallized regions being represented by more than one structure. Intriguingly, the complete sequences of only ~7% of proteins are observed in the corresponding PDB structures, and only ~25% of the total dataset have >95% of their lengths observed in the corresponding PDB structures. This suggests that the vast majority of PDB proteins is shorter than their corresponding Swiss-Prot sequences and/or contain numerous residues, which are not observed in maps of electron density. To determine the prevalence of disordered regions in PDB, the residues in the Swiss-Prot sequences were grouped into four general categories, “Observed” (which correspond to structured regions), “Not observed” (regions with missing electron density, potentially disordered), “Uncharacterized,” and “Ambiguous,” depending on their appearance in the corresponding PDB entries. This non-redundant set of residues can be viewed as a ‘fragment’ or empirical domain database that contains a set of experimentally determined structured regions or domains and a set of experimentally verified disordered regions or domains. We studied the propensities and properties of residues in these four categories and analyzed their relations to the predictions of disorder using several algorithms. “Non-observed,” “Ambiguous,” and “Uncharacterized” regions were shown to possess the amino acid compositional biases typical of intrinsically disordered proteins. The application of four different disorder predictors (PONDR® VL-XT, VL3-BA, VSL1P, and IUPred) revealed that the vast majority of residues in the “Observed” dataset are ordered, and that the “Not observed” regions are mostly disordered. The “Uncharacterized” regions possess some tendency toward order, whereas the predictions for the short “Ambiguous” regions are really ambiguous. Long “Ambiguous” regions (>70 amino acid residues) are mostly predicted to be ordered, suggesting that they are likely to be “wobbly” domains.

Overall, we showed that completely ordered proteins are not highly abundant in PDB and many PDB sequences have disordered regions. In fact, in the analyzed dataset ~10% of the PDB proteins contain regions of consecutive missing or ambiguous residues longer than 30 amino-acids and ~40% of the proteins possess short regions (≥10 and <30 amino-acid long) of missing and ambiguous residues.  相似文献   

16.
Gromiha MM  Suresh MX 《Proteins》2008,70(4):1274-1279
Discriminating thermophilic proteins from their mesophilic counterparts is a challenging task and it would help to design stable proteins. In this work, we have systematically analyzed the amino acid compositions of 3075 mesophilic and 1609 thermophilic proteins belonging to 9 and 15 families, respectively. We found that the charged residues Lys, Arg, and Glu as well as the hydrophobic residues, Val and Ile have higher occurrence in thermophiles than mesophiles. Further, we have analyzed the performance of different methods, based on Bayes rules, logistic functions, neural networks, support vector machines, decision trees and so forth for discriminating mesophilic and thermophilic proteins. We found that most of the machine learning techniques discriminate these classes of proteins with similar accuracy. The neural network-based method could discriminate the thermophiles from mesophiles at the five-fold cross-validation accuracy of 89% in a dataset of 4684 proteins. Moreover, this method is tested with 325 mesophiles in Xylella fastidosa and 382 thermophiles in Aquifex aeolicus and it could successfully discriminate them with the accuracy of 91%. These accuracy levels are better than other methods in the literature and we suggest that this method could be effectively used to discriminate mesophilic and thermophilic proteins.  相似文献   

17.
Importance of long-range interactions in protein folding   总被引:2,自引:0,他引:2  
Long-range interactions play an active role in the stability of protein molecules. In this work, we have analyzed the importance of long-range interactions in different structural classes of globular proteins in terms of residue distances. We found that 85% of residues are involved in long-range contacts. The residues occurring in the range of 4-10 residues apart contribute more towards long-range contacts in all-alpha proteins while the range is 11-20 in all-beta proteins. The hydrophobic residues Cys, Ile and Val prefer the 11-20 range and all other residues prefer the 4-10 range. The residues in all-beta proteins have an average of 3-8 long-range contacts whereas the residues in other classes have 1-4 long-range contracts. Furthermore, the preference of residue pairs to the folding and stability will be discussed.  相似文献   

18.
Wang ZX  Yuan Z 《Proteins》2000,38(2):165-175
Proteins of known structures are usually classified into four structural classes: all-alpha, all-beta, alpha+beta, and alpha/beta type of proteins. A number of methods to predicting the structural class of a protein based on its amino acid composition have been developed during the past few years. Recently, a component-coupled method was developed for predicting protein structural class according to amino acid composition. This method is based on the least Mahalanobis distance principle, and yields much better predicted results in comparison with the previous methods. However, the success rates reported for structural class prediction by different investigators are contradictory. The highest reported accuracies by this method are near 100%, but the lowest one is only about 60%. The goal of this study is to resolve this paradox and to determine the possible upper limit of prediction rate for structural classes. In this paper, based on the normality assumption and the Bayes decision rule for minimum error, a new method is proposed for predicting the structural class of a protein according to its amino acid composition. The detailed theoretical analysis indicates that if the four protein folding classes are governed by the normal distributions, the present method will yield the optimum predictive result in a statistical sense. A non-redundant data set of 1,189 protein domains is used to evaluate the performance of the new method. Our results demonstrate that 60% correctness is the upper limit for a 4-type class prediction from amino acid composition alone for an unknown query protein. The apparent relatively high accuracy level (more than 90%) attained in the previous studies was due to the preselection of test sets, which may not be adequately representative of all unrelated proteins.  相似文献   

19.
MOTIVATION: Discriminating outer membrane proteins from other folding types of globular and membrane proteins is an important task both for identifying outer membrane proteins from genomic sequences and for the successful prediction of their secondary and tertiary structures. RESULTS: We have systematically analyzed the amino acid composition of globular proteins from different structural classes and outer membrane proteins. We found that the residues, Glu, His, Ile, Cys, Gln, Asn and Ser, show a significant difference between globular and outer membrane proteins. Based on this information, we have devised a statistical method for discriminating outer membrane proteins from other globular and membrane proteins. Our approach correctly picked up the outer membrane proteins with an accuracy of 89% for the training set of 337 proteins. On the other hand, our method has correctly excluded the globular proteins at an accuracy of 79% in a non-redundant dataset of 674 proteins. Furthermore, the present method is able to correctly exclude alpha-helical membrane proteins up to an accuracy of 80%. These accuracy levels are comparable to other methods in the literature, and this is a simple method, which could be used for dissecting outer membrane proteins from genomic sequences. The influence of protein size, structural class and specific residues for discrimination is discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号