首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 312 毫秒
1.
蛋白质折叠速率的正确预测对理解蛋白质的折叠机理非常重要。本文从伪氨基酸组成的方法出发,提出利用序列疏水值震荡的方法来提取蛋白质氨基酸的序列顺序信息,建立线性回归模型进行折叠速率预测。该方法不需要蛋白质的任何二级结构、三级结构信息或结构类信息,可直接从序列对蛋白质折叠速率进行预测。对含有62个蛋白质的数据集,经过Jack.knife交互检验验证,相关系数达到0.804,表示折叠速率预测值与实验值有很好的相关性,说明了氨基酸序列信息对蛋白质折叠速率影响重要。同其他方法相比,本文的方法具有计算简单,输入参数少等特点。  相似文献   

2.
鉴于蛋白质折叠速率预测对研究其蛋白质功能的重要性,许多的科研工作者都开始对影响蛋白质折叠速率的因素进行研究。各种预测参数和方法被提出。利用蛋白质编码序列的不同特征参数,不同的二级结构及不同的折叠类的蛋白质对折叠速率的不同影响,我们选取蛋白质编码序列的新的特征值,即选取蛋白质序列的LZ复杂度,等电点等特征值。然后把这些特征值与20种氨基酸的属性αc、Cα、K0、Pβ、Ra、ΔASA、PI、ΔGhD、Nm、LZ、Mu、El融合,建立多元线性回归模型,并利用回归模型计算了13个全α类蛋白质、18个全β类蛋白质、13个混合类蛋白质和39个未分类蛋白质的ln(kf)与预测值之间的相关系数分别达到0.89、0.93、0.98、0.86。在Jack-knife方法的验证下发现在不同的结构中混合特征值与相应折叠速率有很好的相关性。结果表明,在蛋白质折叠过程中,蛋白质序列的LZ复杂度、等电点等特征值可能影响蛋白质的折叠速率及其结构。  相似文献   

3.
依据蛋白质折叠子中氨基酸保守性,以氨基酸、氨基酸的极性、氨基酸的电性以及氨基酸的亲—疏水性为参数,从蛋白质的氨基酸序列出发,采用"一对多"的分类策略,通过构建打分矩阵和选取氨基酸序列模式片断,利用5种相似性打分函数对27类折叠子进行识别,最好的预测精度达到83.46%。结果表明,打分矩阵是预测多类蛋白质折叠子有效的方法。  相似文献   

4.
以序列相似性低于40%的1895条蛋白质序列构建涵盖27个折叠类型的蛋白质折叠子数据库,从蛋白质序列出发,用模体频数值、低频功率谱密度值、氨基酸组分、预测的二级结构信息和自相关函数值构成组合向量表示蛋白质序列信息,采用支持向量机算法,基于整体分类策略,对27类蛋白质折叠子的折叠类型进行预测,独立检验的预测精度达到了66.67%。同时,以同样的特征参数和算法对27类折叠子的4个结构类型进行了预测,独立检验的预测精度达到了89.24%。将同样的方法用于前人使用过的27类折叠子数据库,得到了好于前人的预测结果。  相似文献   

5.
蛋白质折叠模式识别是一种分析蛋白质结构的重要方法。以序列相似性较低的蛋白质为训练集,提取蛋白质序列信息频数及疏水性等信息作为折叠类型特征,从SCOP数据库中已分类蛋白质构建1 393种折叠模式的数据集,采用SVM预测蛋白质1 393种折叠模式。封闭测试准确率达99.612 2%,基于SCOP的开放测试准确率达79.632 9%。基于另一个权威测试集的开放测试折叠准确率达64.705 9%,SCOP类准确率达76.470 6%,可以有效地对蛋白质折叠模式进行预测,从而为蛋白质从头预测提供参考。  相似文献   

6.
理论和实验研究表明,蛋白质天然拓扑结构对其折叠过程具有重要的影响.采用复杂网络的方法分析蛋白质天然结构的拓扑特征,并探索蛋白质结构特征与折叠速率之间的内在联系.分别构建了蛋白质氨基酸网络、疏水网、亲水网、亲水-疏水网以及相应的长程网络,研究了这些网络的匹配系数(assortativity coefficient)和聚集系数(clustering coefficient)的统计特性.结果表明,除了亲水-疏水网,上述各网络的匹配系数均为正值,并且氨基酸网和疏水网的匹配系数与折叠速率表现出明显的线性正相关,揭示了疏水残基间相互作用的协同性有助于蛋白质的快速折叠.同时,研究发现疏水网的聚集系数与折叠速率有明显的线性负相关关系,这表明疏水残基间三角结构(triangle construction)的形成不利于蛋白质快速折叠.还进一步构建了相应的长程网络,发现序列上间距较远的残基接触对的形成将使蛋白质折叠进程变慢.  相似文献   

7.
蛋白质折叠识别算法是蛋白质三维结构预测的重要方法之一,该方法在生物科学的许多方面得到卓有成效的应用。在过去的十年中,我们见证了一系列基于不同计算方式的蛋白质折叠识别方法。在这些计算方法中,机器学习和序列谱-序列谱比对是两种在蛋白质折叠中应用较为广泛和有效的方法。除了计算方法的进展外,不断增大的蛋白质结构数据库也是蛋白质折叠识别的预测精度不断提高的一个重要因素。在这篇文章中,我们将简要地回顾蛋白质折叠中的先进算法。另外,我们也将讨论一些可能可以应用于改进蛋白质折叠算法的策略。  相似文献   

8.
基于模糊支持向量机的膜蛋白折叠类型预测   总被引:1,自引:0,他引:1  
现有的基于支持向量机(support vector machine,SVM)来预测膜蛋白折叠类型的方法.利用的蛋白质序列特征并不充分.并且在处理多类蛋白质分类问题时存在不可分区域,针对这两类问题.提取蛋白质序列的氨基酸和二肽组成特征,并计算加权的多阶氨基酸残基指数相关系数特征,将3类特征融和作为分类器的输入特征矢量.并采用模糊SVM(fuzzy SVM,FSVM)算法解决对传统SVM不可分数据的分类.在无冗余的数据集上测试结果显示.改进的特征提取方法在相同分类算法下预测性能优于已有的特征提取方法:FSVM在相同特征提取方法下性能优于传统的SVM.二者相结合的分类策略在独立性数据集测试下的预测精度达到96.6%.优于现有的多种预测方法.能够作为预测膜蛋白和其它蛋白质折叠类型的有效工具.  相似文献   

9.
理解蛋白质折叠速率是探明蛋白质结构和折叠机制物理基础的关键.蛋白质折叠速率的温度依赖关系是当前一个未解决的难题.假定蛋白质折叠是一个分子构象间的量子跃迁,导出了一个蛋白质折叠速率的解析公式.由此公式出发,计算了资料库中二态蛋白质的折叠速率和研究了它们的温度依赖性.从第一性原理出发,对实验给出的16个二态蛋白质折叠速率的非阿列尼乌斯(non-Arrhenius)温度关系给予成功解释,进而预测了这些蛋白质解折叠速率的温度依赖关系.依据量子折叠理论,给出了一个预测二态蛋白质折叠速率的统计公式,用于65个蛋白的资料库,理论和实验比较的相关系数为0.73.此外,理论还给出了与实验结果一致的最大和最小折叠速率估计.  相似文献   

10.
以H5N2亚型禽流感病毒毒株血凝素蛋白裂解位点碱性氨基酸为研究对象,对其密码子偏好性和对应mRNA序列的折叠二级结构特点进行研究和分析。旨在探讨裂解位点氨基酸对应mRNA核苷酸片段的二级结构与病毒致病力的关系,希望能对禽流感病毒的研究提供一些基础性信息。将mRNA样本按照序列等步长递增的方法,用RNAstructure 4.1程序预测这些样本的动态延伸折叠二级结构。序列和结构的分析结果:裂解位点的碱性氨基酸对富含腺嘌呤的密码子有强烈偏好;与碱性氨基酸对应的mRNA片段上的核苷酸主要位于折叠二级结构的单链环区,少数位于配对螺旋区。结果表明:裂解位点氨基酸对应的mRNA核苷酸形成发夹端环的大小与其碱性氨基酸的多少具有正相关性。  相似文献   

11.
Predicting protein folding rate from amino acid sequence is an important challenge in computational and molecular biology. Over the past few years, many methods have been developed to reflect the correlation between the folding rates and protein structures and sequences. In this paper, we present an effective method, a combined neural network--genetic algorithm approach, to predict protein folding rates only from amino acid sequences, without any explicit structural information. The originality of this paper is that, for the first time, it tackles the effect of sequence order. The proposed method provides a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.80 and the standard error is 2.65 for 93 proteins, the largest such databases of proteins yet studied, when evaluated with leave-one-out jackknife test. The comparative results demonstrate that this correlation is better than most of other methods, and suggest the important contribution of sequence order information to the determination of protein folding rates.  相似文献   

12.
Ma BG  Guo JX  Zhang HY 《Proteins》2006,65(2):362-372
Discovering the mechanism of protein folding, in molecular biology, is a great challenge. A key step to this end is to find factors that correlate with protein folding rates. Over the past few years, many empirical parameters, such as contact order, long-range order, total contact distance, secondary structure contents, have been developed to reflect the correlation between folding rates and protein tertiary or secondary structures. However, the correlation between proteins' folding rates and their amino acid compositions has not been explored. In the present work, we examined systematically the correlation between proteins' folding rates and their amino acid compositions for two-state and multistate folders and found that different amino acids contributed differently to the folding progress. The relation between the amino acids' molecular weight and degeneracy and the folding rates was examined, and the role of hydrophobicity in the protein folding process was also inspected. As a consequence, a new indicator called composition index was derived, which takes no structure factors into account and is merely determined by the amino acid composition of a protein. Such an indicator is found to be highly correlated with the protein's folding rate (r > 0.7). From the results of this work, three points of concluding remarks are evident. (1) Two-state folders and multistate folders have different rate-determining amino acids. (2) The main determining information of a protein's folding rate is largely reflected in its amino acid composition. (3) Composition index may be the best predictor for an ab initio protein folding rate prediction directly from protein sequence from the standpoint of practical application.  相似文献   

13.
Huang JT  Tian J 《Proteins》2006,63(3):551-554
The significant correlation between protein folding rates and the sequence-predicted secondary structure suggests that folding rates are largely determined by the amino acid sequence. Here, we present a method for predicting the folding rates of proteins from sequences using the intrinsic properties of amino acids, which does not require any information on secondary structure prediction and structural topology. The contribution of residue to the folding rate is expressed by the residue's Omega value. For a given residue, its Omega depends on the amino acid properties (amino acid rigidity and dislike of amino acid for secondary structures). Our investigation achieves 82% correlation with folding rates determined experimentally for simple, two-state proteins studied until the present, suggesting that the amino acid sequence of a protein is an important determinant of the protein-folding rate and mechanism.  相似文献   

14.
What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28‐letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28‐letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design. Proteins 2015; 83:631–639. © 2015 Wiley Periodicals, Inc.  相似文献   

15.
To understand the folding behavior of proteins is an important and challenging problem in modern molecular biology. In the present investigation, a large number of features representing protein sequences were developed based on sequence autocorrelation weighted by properties of amino acid residues. Genetic algorithm (GA) combined with multiple linear regression (MLR) was employed to select significant features related to protein folding rates, and to build global predictive model. Moreover, local lazy regression (LLR) method was also used to predict the protein folding rates. The obtained results indicated that LLR performed much better than the global MLR model. The important properties of amino acid residues affecting protein folding rates were also analyzed. The results of this study will be helpful to understand the mechanism of protein folding. Our results also demonstrate that the features of amino acid sequence autocorrelation is effective in representing the relationship between protein sequence and folding rates, and the local method is a powerful tool to predict the protein folding rates.  相似文献   

16.
Many single-domain proteins exhibit two-state folding kinetics, with folding rates that span more than six orders of magnitude. A quantity of much recent interest for such proteins is their contact order, the average separation in sequence between contacting residue pairs. Numerous studies have reached the surprising conclusion that contact order is well-correlated with the logarithm of the folding rate for these small, well-characterized molecules. Here, we investigate the physico-chemical basis for this finding by asking whether contact order is actually a composite number that measures the fraction of local secondary structure in the protein; viz. turns, helices, and hairpins. To pursue this question, we calculated the secondary structure content for 24 two-state proteins and obtained coefficients that predict their folding rates. The predicted rates correlate strongly with experimentally determined rates, comparable to the correlation with contact order. Further, these predicted folding rates are correlated strongly with contact order. Our results suggest that the folding rate of two-state proteins is a function of their local secondary structure content, consistent with the hierarchic model of protein folding. Accordingly, it should be possible to utilize secondary structure prediction methods to predict folding rates from sequence alone.  相似文献   

17.
Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long‐range and short‐range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci‐bioinfo.cn/swfrate/input.jsp . Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

18.
Huang JT  Xing DJ  Huang W 《Amino acids》2012,43(2):567-572
The successful prediction of protein-folding rates based on the sequence-predicted secondary structure suggests that the folding rates might be predicted from sequence alone. To pursue this question, we directly predict the folding rates from amino acid sequences, which do not require any information on secondary or tertiary structure. Our work achieves 88% correlation with folding rates determined experimentally for proteins of all folding types and peptide, suggesting that almost all of the information needed to specify a protein's folding kinetics and mechanism is comprised within its amino acid sequence. The influence of residue on folding rate is related to amino acid properties. Hydrophobic character of amino acids may be an important determinant of folding kinetics, whereas other properties, size, flexibility, polarity and isoelectric point, of amino acids have contributed little to the folding rate constant.  相似文献   

19.
The contact order is believed to be an important factor for understanding protein folding mechanisms. In our earlier work, we have shown that the long-range interactions play a vital role in protein folding. In this work, we analyzed the contribution of long-range contacts to determine the folding rate of two-state proteins. We found that the residues that are close in space and are separated by at least ten to 15 residues in sequence are important determinants of folding rates, suggesting the presence of a folding nucleus at an interval of approximately 25 residues. A novel parameter "long-range order" has been proposed to predict protein folding rates. This parameter shows as good a relationship with the folding rate of two-state proteins as contact order. Further, we examined the minimum limit of residue separation to determine the long-range contacts for different structural classes. We observed an excellent correlation between long-range order and folding rate for all classes of globular proteins. We suggest that in mixed-class proteins, a larger number of residues can serve as folding nuclei compared to all-alpha and all-beta proteins. A simple statistical method has been developed to predict the folding rates of two-state proteins using the long-range order that produces an agreement with experimental results that is better or comparable to other methods in the literature.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号