首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 178 毫秒
1.
Huang JT  Tian J 《Proteins》2006,63(3):551-554
The significant correlation between protein folding rates and the sequence-predicted secondary structure suggests that folding rates are largely determined by the amino acid sequence. Here, we present a method for predicting the folding rates of proteins from sequences using the intrinsic properties of amino acids, which does not require any information on secondary structure prediction and structural topology. The contribution of residue to the folding rate is expressed by the residue's Omega value. For a given residue, its Omega depends on the amino acid properties (amino acid rigidity and dislike of amino acid for secondary structures). Our investigation achieves 82% correlation with folding rates determined experimentally for simple, two-state proteins studied until the present, suggesting that the amino acid sequence of a protein is an important determinant of the protein-folding rate and mechanism.  相似文献   

2.
从氨基酸序列预测蛋白质折叠速率   总被引:1,自引:0,他引:1  
蛋白质折叠速率预测是当今生物物理学最具挑战性的课题之一.近年来,许多科研工作者开展了大量的研究工作来探索折叠速率的决定因素,许多参数和方法被相继提出.但氨基酸残基间的相互作用、氨基酸的序列顺序等信息对折叠速率的影响从未被提及.采用伪氨基酸组成的方法提取氨基酸的序列顺序信息,利用蒙特卡洛方法选择最佳特征因子,建立线性回归模型进行折叠速率预测.该方法能在不需要任何(显示)结构信息的情况下,直接从蛋白质的氨基酸序列出发对折叠速率进行预测.在Jackknife交互检验方法的验证下,对含有99个蛋白质的数据集,发现折叠速率的预测值与实验值有很好的相关性,相关系数能达到0.81,预测误差仅为2.54.这一精度明显优于其他基于序列的方法,充分说明蛋白质的序列顺序信息是影响蛋白质折叠速率的重要因素.  相似文献   

3.
Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long‐range and short‐range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci‐bioinfo.cn/swfrate/input.jsp . Proteins 2013. © 2012 Wiley Periodicals, Inc.  相似文献   

4.
Huang JT  Xing DJ  Huang W 《Amino acids》2012,43(2):567-572
The successful prediction of protein-folding rates based on the sequence-predicted secondary structure suggests that the folding rates might be predicted from sequence alone. To pursue this question, we directly predict the folding rates from amino acid sequences, which do not require any information on secondary or tertiary structure. Our work achieves 88% correlation with folding rates determined experimentally for proteins of all folding types and peptide, suggesting that almost all of the information needed to specify a protein's folding kinetics and mechanism is comprised within its amino acid sequence. The influence of residue on folding rate is related to amino acid properties. Hydrophobic character of amino acids may be an important determinant of folding kinetics, whereas other properties, size, flexibility, polarity and isoelectric point, of amino acids have contributed little to the folding rate constant.  相似文献   

5.
To understand the folding behavior of proteins is an important and challenging problem in modern molecular biology. In the present investigation, a large number of features representing protein sequences were developed based on sequence autocorrelation weighted by properties of amino acid residues. Genetic algorithm (GA) combined with multiple linear regression (MLR) was employed to select significant features related to protein folding rates, and to build global predictive model. Moreover, local lazy regression (LLR) method was also used to predict the protein folding rates. The obtained results indicated that LLR performed much better than the global MLR model. The important properties of amino acid residues affecting protein folding rates were also analyzed. The results of this study will be helpful to understand the mechanism of protein folding. Our results also demonstrate that the features of amino acid sequence autocorrelation is effective in representing the relationship between protein sequence and folding rates, and the local method is a powerful tool to predict the protein folding rates.  相似文献   

6.
蛋白质折叠速率的正确预测对理解蛋白质的折叠机理非常重要。本文从伪氨基酸组成的方法出发,提出利用序列疏水值震荡的方法来提取蛋白质氨基酸的序列顺序信息,建立线性回归模型进行折叠速率预测。该方法不需要蛋白质的任何二级结构、三级结构信息或结构类信息,可直接从序列对蛋白质折叠速率进行预测。对含有62个蛋白质的数据集,经过Jack.knife交互检验验证,相关系数达到0.804,表示折叠速率预测值与实验值有很好的相关性,说明了氨基酸序列信息对蛋白质折叠速率影响重要。同其他方法相比,本文的方法具有计算简单,输入参数少等特点。  相似文献   

7.
One of the goals of molecular bioinformatics is decoding amino acid sequences to extract information on the principles of protein folding. However, this is difficult to perform with standard bioinformatics techniques such as multiple sequence alignment and so on. Thus, we propose a technique based on inter-residue average distance statistics to make predictions regarding the protein folding mechanisms of amino acid sequences. Our method involves constructing a kind of predicted contact map called an Average Distance Map (ADM) based on average distance statistics to pinpoint regions of possible folding nuclei for proteins. Only information on the amino acid sequence of a given protein is required for the present method. In this article, we summarize the results of studies using our method to analyze how specific protein sequences affect folding properties. In particular, we present studies on proteins in the phage lysozyme, such as the globin, fatty acid binding protein-like, and the cupredoxin-like fold families. In the present review, we characterize the 3D architectures of these proteins through the properties of the protein ADMs. Furthermore, we combine the information on the conserved residues within the regions predicted by the ADMs with our results obtained so far. Such information may help identify the folding characteristics of each protein. We discuss this possibility in the present review.  相似文献   

8.
Ma BG  Guo JX  Zhang HY 《Proteins》2006,65(2):362-372
Discovering the mechanism of protein folding, in molecular biology, is a great challenge. A key step to this end is to find factors that correlate with protein folding rates. Over the past few years, many empirical parameters, such as contact order, long-range order, total contact distance, secondary structure contents, have been developed to reflect the correlation between folding rates and protein tertiary or secondary structures. However, the correlation between proteins' folding rates and their amino acid compositions has not been explored. In the present work, we examined systematically the correlation between proteins' folding rates and their amino acid compositions for two-state and multistate folders and found that different amino acids contributed differently to the folding progress. The relation between the amino acids' molecular weight and degeneracy and the folding rates was examined, and the role of hydrophobicity in the protein folding process was also inspected. As a consequence, a new indicator called composition index was derived, which takes no structure factors into account and is merely determined by the amino acid composition of a protein. Such an indicator is found to be highly correlated with the protein's folding rate (r > 0.7). From the results of this work, three points of concluding remarks are evident. (1) Two-state folders and multistate folders have different rate-determining amino acids. (2) The main determining information of a protein's folding rate is largely reflected in its amino acid composition. (3) Composition index may be the best predictor for an ab initio protein folding rate prediction directly from protein sequence from the standpoint of practical application.  相似文献   

9.
It is currently believed that the protein folding rate is related to the protein structures and its amino acid sequence. However, few studies have been done on the problem that whether the protein folding rate is influenced by its corresponding mRNA sequence. In this paper, we analyzed the possible relationship between the protein folding rates and the corresponding mRNA sequences. The content of guanine and cytosine (GC content) of palindromes in protein coding sequence was introduced as a new parameter and added in the Gromiha's model of predicting protein folding rates to inspect its effect in protein folding process. The multiple linear regression analysis and jack-knife test show that the new parameter is significant. The linear correlation coefficient between the experimental and the predicted values of the protein folding rates increased significantly from 0.96 to 0.99, and the population variance decreased from 0.50 to 0.24 compared with Gromiha's results. The results show that the GC content of palindromes in the corresponding protein coding sequence really influences the protein folding rate. Further analysis indicates that this kind of effect mostly comes from the synonymous codon usage and from the information of palindrome structure itself, but not from the translation information from codons to amino acids.  相似文献   

10.
理论和实验研究表明,蛋白质天然拓扑结构对其折叠过程具有重要的影响.采用复杂网络的方法分析蛋白质天然结构的拓扑特征,并探索蛋白质结构特征与折叠速率之间的内在联系.分别构建了蛋白质氨基酸网络、疏水网、亲水网、亲水-疏水网以及相应的长程网络,研究了这些网络的匹配系数(assortativity coefficient)和聚集系数(clustering coefficient)的统计特性.结果表明,除了亲水-疏水网,上述各网络的匹配系数均为正值,并且氨基酸网和疏水网的匹配系数与折叠速率表现出明显的线性正相关,揭示了疏水残基间相互作用的协同性有助于蛋白质的快速折叠.同时,研究发现疏水网的聚集系数与折叠速率有明显的线性负相关关系,这表明疏水残基间三角结构(triangle construction)的形成不利于蛋白质快速折叠.还进一步构建了相应的长程网络,发现序列上间距较远的残基接触对的形成将使蛋白质折叠进程变慢.  相似文献   

11.
Kuznetsov IB  Rackovsky S 《Proteins》2004,54(2):333-341
Small single-domain proteins that fold by simple two-state kinetics have been shown to exhibit a wide variation in their folding rates. It has been proposed that folding mechanisms in these proteins are largely determined by the native-state topology, and a significant correlation between folding rate and measures of the average topological complexity, such as relative contact order (RCO), has been reported. We perform a statistical analysis of folding rate and RCO in all three major structural classes (alpha, beta, and alpha/beta) of small two-state proteins and of RCO in groups of analogous and homologous small single-domain proteins with the same topology. We also study correlation between folding rate and the average physicochemical properties of amino acid sequences in two-state proteins. Our results indicate that 1) helical proteins have statistically distinguishable, class-specific folding rates; 2) RCO accounts for essentially all the variation of folding rate in helical proteins, but for only a part of the variation in beta-sheet-containing proteins; and 3) only a small fraction of the protein topologies studied show a topology-specific RCO. We also report a highly significant correlation between the folding rate and average intrinsic structural propensities of protein sequences. These results suggest that intrinsic structural propensities may be an important determinant of the rate of folding in small two-state proteins.  相似文献   

12.
The contact order is believed to be an important factor for understanding protein folding mechanisms. In our earlier work, we have shown that the long-range interactions play a vital role in protein folding. In this work, we analyzed the contribution of long-range contacts to determine the folding rate of two-state proteins. We found that the residues that are close in space and are separated by at least ten to 15 residues in sequence are important determinants of folding rates, suggesting the presence of a folding nucleus at an interval of approximately 25 residues. A novel parameter "long-range order" has been proposed to predict protein folding rates. This parameter shows as good a relationship with the folding rate of two-state proteins as contact order. Further, we examined the minimum limit of residue separation to determine the long-range contacts for different structural classes. We observed an excellent correlation between long-range order and folding rate for all classes of globular proteins. We suggest that in mixed-class proteins, a larger number of residues can serve as folding nuclei compared to all-alpha and all-beta proteins. A simple statistical method has been developed to predict the folding rates of two-state proteins using the long-range order that produces an agreement with experimental results that is better or comparable to other methods in the literature.  相似文献   

13.
What are the key building blocks that would have been needed to construct complex protein folds? This is an important issue for understanding protein folding mechanism and guiding de novo protein design. Twenty naturally occurring amino acids and eight secondary structures consist of a 28‐letter alphabet to determine folding kinetics and mechanism. Here we predict folding kinetic rates of proteins from many reduced alphabets. We find that a reduced alphabet of 10 letters achieves good correlation with folding rates, close to the one achieved by full 28‐letter alphabet. Many other reduced alphabets are not significantly correlated to folding rates. The finding suggests that not all amino acids and secondary structures are equally important for protein folding. The foldable sequence of a protein could be designed using at least 10 folding units, which can either promote or inhibit protein folding. Reducing alphabet cardinality without losing key folding kinetic information opens the door to potentially faster machine learning and data mining applications in protein structure prediction, sequence alignment and protein design. Proteins 2015; 83:631–639. © 2015 Wiley Periodicals, Inc.  相似文献   

14.
Protein combinatorial libraries provide new ways to probe the determinants of folding and to discover novel proteins. Such libraries are often constructed by expressing an ensemble of partially random gene sequences. Given the intractably large number of possible sequences, some limitation on diversity must be imposed. A non-uniform distribution of nucleotides can be used to reduce the number of possible sequences and encode peptide sequences having a predetermined set of amino acid probabilities at each residue position, i.e., the amino acid sequence profile. Such profiles can be determined by inspection, multiple sequence alignment or physically-based computational methods. Here we present a computational method that takes as input a desired sequence profile and calculates the individual nucleotide probabilities among partially random genes. The calculated gene library can be readily used in the context of standard DNA synthesis to generate a protein library with essentially the desired profile. The fidelity between the desired profile and the calculated one coded by these partially random genes is quantitatively evaluated using the linear correlation coefficient and a relative entropy, each of which provides a measure of profile agreement at each position of the sequence. On average, this method of identifying such codon frequencies performs as well or better than other methods with regard to fidelity to the original profile. Importantly, the method presented here provides much better yields of complete sequences that do not contain stop codons, a feature that is particularly important when all or large fractions of a gene are subject to combinatorial mutation.  相似文献   

15.
鉴于蛋白质折叠速率预测对研究其蛋白质功能的重要性,许多的科研工作者都开始对影响蛋白质折叠速率的因素进行研究。各种预测参数和方法被提出。利用蛋白质编码序列的不同特征参数,不同的二级结构及不同的折叠类的蛋白质对折叠速率的不同影响,我们选取蛋白质编码序列的新的特征值,即选取蛋白质序列的LZ复杂度,等电点等特征值。然后把这些特征值与20种氨基酸的属性αc、Cα、K0、Pβ、Ra、ΔASA、PI、ΔGhD、Nm、LZ、Mu、El融合,建立多元线性回归模型,并利用回归模型计算了13个全α类蛋白质、18个全β类蛋白质、13个混合类蛋白质和39个未分类蛋白质的ln(kf)与预测值之间的相关系数分别达到0.89、0.93、0.98、0.86。在Jack-knife方法的验证下发现在不同的结构中混合特征值与相应折叠速率有很好的相关性。结果表明,在蛋白质折叠过程中,蛋白质序列的LZ复杂度、等电点等特征值可能影响蛋白质的折叠速率及其结构。  相似文献   

16.
Plaxco KW  Simons KT  Ruczinski I  Baker D 《Biochemistry》2000,39(37):11177-11183
The fastest simple, single domain proteins fold a million times more rapidly than the slowest. Ultimately this broad kinetic spectrum is determined by the amino acid sequences that define these proteins, suggesting that the mechanisms that underlie folding may be almost as complex as the sequences that encode them. Here, however, we summarize recent experimental results which suggest that (1) despite a vast diversity of structures and functions, there are fundamental similarities in the folding mechanisms of single domain proteins and (2) rather than being highly sensitive to the finest details of sequence, their folding kinetics are determined primarily by the large-scale, redundant features of sequence that determine a protein's gross structural properties. That folding kinetics can be predicted using simple, empirical, structure-based rules suggests that the fundamental physics underlying folding may be quite straightforward and that a general and quantitative theory of protein folding rates and mechanisms (as opposed to unfolding rates and thus protein stability) may be near on the horizon.  相似文献   

17.
MOTIVATION: It is known that the physico-chemical characteristics of proteins underlying specific folding of the polypeptide chain and the protein function are evolutionary conserved. Detection of such characteristics while analyzing homologous sequences would expand essentially the knowledge on protein function, structure, and evolution. These characteristics are maintained constant, in particular, by co-ordinated substitutions. In this process, the destabilizing effect of a substitution may be compensated by another substitution at a different position within the same protein, making the overall change in this protein characteristic insignificant. Consequently, the patterns of co-ordinated substitutions contain important information on conserved physico-chemical properties of proteins, requiring their investigation and development of the corresponding methods and software for correlation analysis of protein sequences available to a wide range of users. RESULTS: A software package for analyzing correlated amino acid substitutions at different positions within aligned protein sequences was developed. The approach implies searching for evolutionary conserved physico-chemical characteristics of proteins based on the information on the pairwise correlations of amino acid substitutions at different protein positions. The software was applied to analyze DNA-binding domains of the homeodomain class. As a result, two conservative physico-chemical characteristics preserved due to the co-ordinated substitutions at certain groups of positions in the protein sequence. Possible functional roles of these characteristics are discussed. AVAILABILITY: The program package is available at http://wwwmgs.bionet.nsc.ru/programs/CRASP/.  相似文献   

18.
Many single-domain proteins exhibit two-state folding kinetics, with folding rates that span more than six orders of magnitude. A quantity of much recent interest for such proteins is their contact order, the average separation in sequence between contacting residue pairs. Numerous studies have reached the surprising conclusion that contact order is well-correlated with the logarithm of the folding rate for these small, well-characterized molecules. Here, we investigate the physico-chemical basis for this finding by asking whether contact order is actually a composite number that measures the fraction of local secondary structure in the protein; viz. turns, helices, and hairpins. To pursue this question, we calculated the secondary structure content for 24 two-state proteins and obtained coefficients that predict their folding rates. The predicted rates correlate strongly with experimentally determined rates, comparable to the correlation with contact order. Further, these predicted folding rates are correlated strongly with contact order. Our results suggest that the folding rate of two-state proteins is a function of their local secondary structure content, consistent with the hierarchic model of protein folding. Accordingly, it should be possible to utilize secondary structure prediction methods to predict folding rates from sequence alone.  相似文献   

19.
Recognition of protein fold from amino acid sequence is a challenging task. The structure and stability of proteins from different fold are mainly dictated by inter-residue interactions. In our earlier work, we have successfully used the medium- and long-range contacts for predicting the protein folding rates, discriminating globular and membrane proteins and for distinguishing protein structural classes. In this work, we analyze the role of inter-residue interactions in commonly occurring folds of globular proteins in order to understand their folding mechanisms. In the medium-range contacts, the globin fold and four-helical bundle proteins have more contacts than that of DNA-RNA fold although they all belong to all-alpha class. In long-range contacts, only the ribonuclease fold prefers 4-10 range and the other folding types prefer the range 21-30 in alpha/beta class proteins. Further, the preferred residues and residue pairs influenced by these different folds are discussed. The information about the preference of medium- and long-range contacts exhibited by the 20 amino acid residues can be effectively used to predict the folding type of each protein.  相似文献   

20.
Folding rates of small single-domain proteins that fold through simple two-state kinetics can be estimated from details of the three-dimensional protein structure. Previously, predictions of secondary structure had been exploited to predict folding rates from sequence. Here, we estimate two-state folding rates from predictions of internal residue-residue contacts in proteins of unknown structure. Our estimate is based on the correlation between the folding rate and the number of predicted long-range contacts normalized by the square of the protein length. It is well known that long-range order derived from known structures correlates with folding rates. The surprise was that estimates based on very noisy contact predictions were almost as accurate as the estimates based on known contacts. On average, our estimates were similar to those previously published from secondary structure predictions. The combination of these methods that exploit different sources of information improved performance. It appeared that the combined method reliably distinguished fast from slow two-state folders.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号