首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Protein folding rates vary by several orders of magnitude and they depend on the topology of the fold and the size and composition of the sequence. Although recent works show that the rates can be predicted from the sequence, allowing for high‐throughput annotations, they consider only the sequence and its predicted secondary structure. We propose a novel sequence‐based predictor, PFR‐AF, which utilizes solvent accessibility and residue flexibility predicted from the sequence, to improve predictions and provide insights into the folding process. The predictor includes three linear regressions for proteins with two‐state, multistate, and unknown (mixed‐state) folding kinetics. PFR‐AF on average outperforms current methods when tested on three datasets. The proposed approach provides high‐quality predictions in the absence of similarity between the predicted and the training sequences. The PFR‐AF's predictions are characterized by high (between 0.71 and 0.95, depending on the dataset) correlation and the lowest (between 0.75 and 0.9) mean absolute errors with respect to the experimental rates, as measured using out‐of‐sample tests. Our models reveal that for the two‐state chains inclusion of solvent‐exposed Ala may accelerate the folding, while increased content of Ile may reduce the folding speed. We also demonstrate that increased flexibility of coils facilitates faster folding and that proteins with larger content of solvent‐exposed strands may fold at a slower pace. The increased flexibility of the solvent‐exposed residues is shown to elongate folding, which also holds, with a lower correlation, for buried residues. Two case studies are included to support our findings. Proteins 2010. © 2010 Wiley‐Liss, Inc.  相似文献   

2.
基于HP模型的蛋白质折叠问题的研究   总被引:1,自引:0,他引:1       下载免费PDF全文
史小红 《生物信息学》2016,14(2):112-116
基于蛋白质二维HP模型提出改进的遗传算法对真实蛋白质进行计算机折叠模拟。结果显示疏水能量函数最小值的蛋白质构象对应含疏水核心的稳定结构,疏水作用在蛋白质折叠中起主要作用。研究表明二维HP模型在蛋白质折叠研究中是可行的和有效的并为进一步揭示蛋白质折叠机理提供重要参考信息。  相似文献   

3.
Protein folding speeds are known to vary over more than eight orders of magnitude. Plaxco, Simons, and Baker (see References) first showed a correlation of folding speed with the topology of the native protein. That and subsequent studies showed, if the native structure of a protein is known, its folding speed can be predicted reasonably well through a correlation with the "localness" of the contacts in the protein. In the present work, we develop a related measure, the geometric contact number, N (alpha), which is the number of nonlocal contacts that are well-packed, by a Voronoi criterion. We find, first, that in 80 proteins, the largest such database of proteins yet studied, N (alpha) is a consistently excellent predictor of folding speeds of both two-state fast folders and more complex multistate folders. Second, we show that folding rates can also be predicted from amino acid sequences directly, without the need to know the native topology or other structural properties.  相似文献   

4.
Domains are the main structural and functional units of larger proteins. They tend to be contiguous in primary structure and can fold and function independently. It has been observed that 10–20% of all encoded proteins contain duplicated domains and the average pairwise sequence identity between them is usually low. In the present study, we have analyzed the structural similarity between domain repeats of proteins with known structures available in the Protein Data Bank using structure-based inter-residue interaction measures such as the number of long-range contacts, surrounding hydrophobicity, and pairwise interaction energy. We used RADAR program for detecting the repeats in a protein sequence which were further validated using Pfam domain assignments. The sequence identity between the repeats in domains ranges from 20 to 40% and their secondary structural elements are well conserved. The number of long-range contacts, surrounding hydrophobicity calculations and pairwise interaction energy of the domain repeats clearly reveal the conservation of 3-D structure environment in the repeats of domains. The proportions of mainchain–mainchain hydrogen bonds and hydrophobic interactions are also highly conserved between the repeats. The present study has suggested that the computation of these structure-based parameters will give better clues about the tertiary environment of the repeats in domains. The folding rates of individual domains in the repeats predicted using the long-range order parameter indicate that the predicted folding rates correlate well with most of the experimentally observed folding rates for the analyzed independently folded domains.  相似文献   

5.
Huang JT  Cheng JP 《Proteins》2008,72(1):44-49
Prediction of protein-folding rates follows different rules in two-state and multi-state kinetics. The prerequisite for the prediction is to recognize the folding kinetic pathway of proteins. Here, we use the logistic regression and support vector machine to discriminate between two-state and multi-state folding proteins. We find that chain length is sufficient to accurately recognize multi-state proteins. There is a transition boundary between two kinetic models. Protein folds with multi-state kinetics, if its length is larger than 112 residues. The logistic prediction from amino acid composition shows that the kinetic pathway of folding is closely related to amino acid volume. Small amino acids make two-state folding easier, and vice versa. However, cysteine, alanine, arginine, lysine, histidine, and methionine do not conform to this rule.  相似文献   

6.
Ma BG  Guo JX  Zhang HY 《Proteins》2006,65(2):362-372
Discovering the mechanism of protein folding, in molecular biology, is a great challenge. A key step to this end is to find factors that correlate with protein folding rates. Over the past few years, many empirical parameters, such as contact order, long-range order, total contact distance, secondary structure contents, have been developed to reflect the correlation between folding rates and protein tertiary or secondary structures. However, the correlation between proteins' folding rates and their amino acid compositions has not been explored. In the present work, we examined systematically the correlation between proteins' folding rates and their amino acid compositions for two-state and multistate folders and found that different amino acids contributed differently to the folding progress. The relation between the amino acids' molecular weight and degeneracy and the folding rates was examined, and the role of hydrophobicity in the protein folding process was also inspected. As a consequence, a new indicator called composition index was derived, which takes no structure factors into account and is merely determined by the amino acid composition of a protein. Such an indicator is found to be highly correlated with the protein's folding rate (r > 0.7). From the results of this work, three points of concluding remarks are evident. (1) Two-state folders and multistate folders have different rate-determining amino acids. (2) The main determining information of a protein's folding rate is largely reflected in its amino acid composition. (3) Composition index may be the best predictor for an ab initio protein folding rate prediction directly from protein sequence from the standpoint of practical application.  相似文献   

7.
张超  张晖  李冀新  高红 《生物信息学》2006,4(3):128-131
遗传算法源于自然界的进化规律,是一种自适应启发式概率性迭代式全局搜索算法。本文主要介绍了GA的基本原理,算法及优点;总结GA在蛋白质结构预测中建立模型和执行策略,以及多种算法相互结合预测蛋白质结构的研究进展。  相似文献   

8.
Pan XM 《Proteins》2001,43(3):256-259
In the present work, a novel method was proposed for prediction of secondary structure. Over a database of 396 proteins (CB396) with a three-state-defining secondary structure, this method with jackknife procedure achieved an accuracy of 68.8% and SOV score of 71.4% using single sequence and an accuracy of 73.7% and SOV score of 77.3% using multiple sequence alignments. Combination of this method with DSC, PHD, PREDATOR, and NNSSP gives Q3 = 76.2% and SOV = 79.8%.  相似文献   

9.
Accurate prediction of protein secondary structural content   总被引:2,自引:0,他引:2  
An improved multiple linear regression (MLR) method is proposed to predict a protein's secondary structural content based on its primary sequence. The amino acid composition, the autocorrelation function, and the interaction function of side-chain mass derived from the primary sequence are taken into account. The average absolute errors of prediction over 704 unrelated proteins with the jackknife test are 0.088, 0.081, and 0.059 with standard deviations 0.073, 0.066, and 0.055 for -helix, -sheet, and coil, respectively. That the sum of predicted secondary structure content should be close to 1.0 was introduced as a criterion to evaluate whether the prediction is acceptable. While only the predictions with the sum of predicted secondary structure content between 0.99 and 1.01 are accepted (about 11% of all proteins), the absolute errors are 0.058 for -helix, 0.054 for -sheet, and 0.045 for coil.  相似文献   

10.
The problem of protein self‐organization is in the focus of current molecular biology studies. Although the general principles are understood, many details remain unclear. Specifically, protein folding rates are of interest because they dictate the rate of protein aggregation which underlies many human diseases. Here we offer predictions of protein folding rates and their correlation with folding nucleus sizes. We calculated free energies of the transition state and sizes of folding nuclei for 84 proteins and peptides whose other parameters were measured at the point of thermodynamic equilibrium between their unfolded and native states. We used the dynamic programming method where each residue was considered to be either as folded as in its native state or completely disordered. The calculated and measured folding rates showed a good correlation at the temperature mid‐transition point (the correlation coefficient was 0.75). Also, we pioneered in demonstrating a moderate (‐0.57) correlation coefficient between the calculated sizes of folding nuclei and the folding rates. Predictions made by different methods were compared. The established good correlation between the estimated free energy barrier and the experimentally found folding rate of each studied protein/peptide indicates that our model gives reliable results for the considered data set. Proteins 2012; © 2012 Wiley Periodicals, Inc.  相似文献   

11.
Proteins are minimally frustrated polymers. However, for realistic protein models, nonnative interactions must be taken into account. In this paper, we analyze the effect of nonnative interactions on the folding rate and on the folding free energy barrier. We present an analytic theory to account for the modification on the free energy landscape upon introduction of nonnative contacts, added as a perturbation to the strong native interactions driving folding. Our theory predicts a rate-enhancement regime at fixed temperature, under the introduction of weak, nonnative interactions. We have thoroughly tested this theoretical prediction with simulations of a coarse-grained protein model, by using an off-lattice C(alpha)model of the src-SH3 domain. The strong agreement between results from simulations and theory confirm the nontrivial result that a relatively small amount of nonnative interaction energy can actually assist the folding to the native structure.  相似文献   

12.
The physicochemical mechanism of protein folding has been elucidated by the island model, describing a growth type of folding. The folding pathway is closely related with nucleation on the polypeptide chain and thus the formation of small local structures or secondary structures at the earliest stage of folding is essential to all following steps. The island model is applicable to any protein, but a high precision of secondary structure prediction is indispensable to folding simulation. The secondary structures formed at the earliest stage of folding are supposed to be of standard form, but they are usually deformed during the folding process, especially at the last stage, although the degree of deformation is different for each protein. Ferredoxin is an example of a protein having this property. According to X-ray investigation (1FDX), ferredoxin is not supposed to have secondary structures. However, if we assumed that in ferredoxin all the residues are in a coil state, we could not attain the correct structure similar to the native one. Further, we found that some parts of the chain are not flexible, suggesting the presence of secondary structures, in agreement with the recent PDB data (1DUR). Assuming standard secondary structures (-helices and -strands) at the nonflexible parts at the early stage of folding, and deforming these at the final stage, a structure similar to the native one was obtained. Another peculiarity of ferredoxin is the absence of disulfide bonds, in spite of its having eight cysteines. The reason cysteines do not form disulfide bonds became clear by applying the lampshade criterion, but more importantly, the two groups of cysteines are ready to make iron complexes, respectively, at a rather later stage of folding. The reason for poor prediction accuracy of secondary structure with conventional methods is discussed.  相似文献   

13.
对预测蛋白质空间结构的拟物算法的有效性进行理论分析,证明用该拟物算法求得合法的结构存在较大的随机性;给出折叠结构发生冲突的判断条件和提高拟物算法有效性的一些修正方案。  相似文献   

14.
Bastolla U  Bruscolini P  Velasco JL 《Proteins》2012,80(9):2287-2304
In comparison with intense investigation of the structural determinants of protein folding rates, the sequence features favoring fast folding have received little attention. Here, we investigate this subject using simple models of protein folding and a statistical analysis of the Protein Data Bank (PDB). The mean-field model by Plotkin and coworkers predicts that the folding rate is accelerated by stronger-than-average interactions at short distance along the sequence. We confirmed this prediction using the Finkelstein model of protein folding, which accounts for realistic features of polymer entropy. We then tested this prediction on the PDB. We found that native interactions are strongest at contact range l = 8. However, since short range contacts tend to be exposed and they are frequently formed in misfolded structures, selection for folding stability tends to make them less attractive, that is, stability and kinetics may have contrasting requirements. Using a recently proposed model, we predicted the relationship between contact range and contact energy based on buriedness and contact frequency. Deviations from this prediction induce a positive correlation between contact range and contact energy, that is, short range contacts are stronger than expected, for 2/3 of the proteins. This correlation increases with the absolute contact order (ACO), as expected if proteins that tend to fold slowly due to large ACO are subject to stronger selection for sequence features favoring fast folding. Our results suggest that the selective pressure for fast folding is detectable only for one third of the proteins in the PDB, in particular those with large contact order.  相似文献   

15.
鉴于蛋白质折叠速率预测对研究其蛋白质功能的重要性,许多的科研工作者都开始对影响蛋白质折叠速率的因素进行研究。各种预测参数和方法被提出。利用蛋白质编码序列的不同特征参数,不同的二级结构及不同的折叠类的蛋白质对折叠速率的不同影响,我们选取蛋白质编码序列的新的特征值,即选取蛋白质序列的LZ复杂度,等电点等特征值。然后把这些特征值与20种氨基酸的属性αc、Cα、K0、Pβ、Ra、ΔASA、PI、ΔGhD、Nm、LZ、Mu、El融合,建立多元线性回归模型,并利用回归模型计算了13个全α类蛋白质、18个全β类蛋白质、13个混合类蛋白质和39个未分类蛋白质的ln(kf)与预测值之间的相关系数分别达到0.89、0.93、0.98、0.86。在Jack-knife方法的验证下发现在不同的结构中混合特征值与相应折叠速率有很好的相关性。结果表明,在蛋白质折叠过程中,蛋白质序列的LZ复杂度、等电点等特征值可能影响蛋白质的折叠速率及其结构。  相似文献   

16.
Contact order revisited: influence of protein size on the folding rate   总被引:13,自引:0,他引:13       下载免费PDF全文
Guided by the recent success of empirical model predicting the folding rates of small two-state folding proteins from the relative contact order (CO) of their native structures, by a theoretical model of protein folding that predicts that logarithm of the folding rate decreases with the protein chain length L as L(2/3), and by the finding that the folding rates of multistate folding proteins strongly correlate with their sizes and have very bad correlation with CO, we reexamined the dependence of folding rate on CO and L in attempt to find a structural parameter that determines folding rates for the totality of proteins. We show that the Abs_CO = CO x L, is able to predict rather accurately folding rates for both two-state and multistate folding proteins, as well as short peptides, and that this Abs_CO scales with the protein chain length as L(0.70 +/- 0.07) for the totality of studied single-domain proteins and peptides.  相似文献   

17.
Wang B  Chen P  Huang DS  Li JJ  Lok TM  Lyu MR 《FEBS letters》2006,580(2):380-384
This paper proposes a novel method that can predict protein interaction sites in heterocomplexes using residue spatial sequence profile and evolution rate approaches. The former represents the information of multiple sequence alignments while the latter corresponds to a residue's evolutionary conservation score based on a phylogenetic tree. Three predictors using a support vector machines algorithm are constructed to predict whether a surface residue is a part of a protein-protein interface. The efficiency and the effectiveness of our proposed approach is verified by its better prediction performance compared with other models. The study is based on a non-redundant data set of heterodimers consisting of 69 protein chains.  相似文献   

18.
Summary. Three models representing different separations of amino acid sources were used to simulate experimental specific radioactivity data and to predict protein fractional synthesis rate (FSR). Data were from a pulse dose of 14C-U Leu given to a non-growing 20 g mouse and a flooding dose of 3H Phe given to a non-growing 200 g rat. Protein synthesis rates estimated using the combined extracellular and intracellular (Ec + Ic) source pool and extracellular and plasma (Ec + Pls) source pool mouse models were 78 and 120% d−1 in liver, 14 and 16% d−1 in brain and 15 and 14% d−1 in muscle. Predicted protein synthesis rates using the Ec + Ic, Ec + Ic + Tr (combined extracellular, intracellular and aminoacyl tRNA source pool) and Ec + Pls rat models were 57, 3.4 and 57% d−1 in gastrocnemius, 58, 71 and 62% d−1 in gut, 8.3, 8.4 and 7.9% d−1 in heart, 32, 23 and 25% d−1 in kidney, 160, 90 and 80% d−1 in liver, 57, 5.5 and 57% d−1 in soleus and 56, 3.4 and 57% d−1 in tibialis. The Ec + Ic + Tr model underestimated protein synthesis rates in mouse tissues (5.0, 27 and 2.5% d−1 for brain, liver and muscle) and rat muscles (3.4, 5.5 and 3.4% d−1 for gastrocnemius, soleus and tibialis). The Ec + Pls model predicted the mouse pulse dose data best and the Ec + Ic model predicted the rat flooding dose data best. Model predictions of FSR imply that identification and separation of the source specific radioactivity is critical to accurately estimate FSR. Received June 11, 2000 Accepted September 26, 2000  相似文献   

19.
Dong Q  Wang X  Lin L 《Proteins》2008,72(1):353-366
In recent years, protein structure prediction using local structure information has made great progress. In this study, a novel and effective method is developed to predict the local structure and the folding fragments of proteins. First, the proteins with known structures are split into fragments. Second, these fragments, represented by dihedrals, are clustered to produce the building blocks (BBs). Third, an efficient machine learning method is used to predict the local structures of proteins from sequence profiles. Finally, a bi-gram model, trained by an iterated algorithm, is introduced to simulate the interactions of these BBs. For test proteins, the building-block lattice is constructed, which contains all the folding fragments of the proteins. The local structures and the optimal fragments are then obtained by the dynamic programming algorithm. The experiment is performed on a subset of the PDB database with sequence identity less than 25%. The results show that the performance of the method is better than the method that uses only sequence information. When multiple paths are returned, the average classification accuracy of local structures is 72.27% and the average prediction accuracy of local structures is 67.72%, which is a significant improvement in comparison with previous studies. The method can predict not only the local structures but also the folding fragments of proteins. This work is helpful for the ab initio protein structure prediction and especially, the understanding of the folding process of proteins.  相似文献   

20.
A number of methods to predicting the folding type of a protein based on its amino acid composition have been developed during the past few years. In order to perform an objective and fair comparison of different prediction methods, a Monte Carlo simulation method was proposed to calculate the asymptotic limit of the prediction accuracy [Zhang and Chou (1992),Biophys. J. 63, 1523–1529, referred to as simulation method I]. However, simulation method I was based on an oversimplified assumption, i.e., there are no correlations between the compositions of different amino acids. By taking into account such correlations, a new method, referred to as simulation method II, has been proposed to recalculate the objective accuracy of prediction for the least Euclidean distance method [Nakashimaet al. (1986),J. Biochem. 99, 152–162] and the least Minkowski distance method [Chou (1989),Prediction in Protein Structure and the Principles of Protein Conformation, Plenum Press, New York, pp. 549–586], respectively. The results show that the prediction accuracy of the former is still better than that of the latter, as found by simulation method I; however, after incorporating the correlative effect, the objective prediction accuracies become lower for both methods. The reason for this phenomenon is discussed in detail. The simulation method and the idea developed in this paper can be applied to examine any other statistical prediction method, including the computersimulated neural network method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号