首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
依据蛋白质氨基酸特性,以氨基酸组成和有偏自协方差函数为特征矢量,用BP神经网络提出了一种预测非同源蛋白质中α螺旋和β折叠二级结构含量的计算方法。采用相互独立的非同源蛋白质数据库对该方法进行了检验。用Ponnuswamy值时,对二级结构α螺旋和β折叠含量的预测结果是;自检验平均绝对误差分别为0.069和0.065,相应标准偏差分别为0.044和0.047;他检验平均绝对误差分别为0.077和0.070,相应标准偏差分别为0.051和0.049。与仅以氨基酸组成为特征矢量的BP神经网络方法比较,相应的他检验平均绝对误差分别减小了0.024和0.016,标准偏差分别减小了0.031和0.018;与改进的多元线性回归方法比较,相应的他检验平均绝对误差分别减小了0.018和0.011,准偏差分别减小了0.020和0.012。表明:基于氨基酸组成和有偏自协方差函数为特征矢量的BP神经网络预测蛋白质二级结构含量的方法可有效提高预测精度。  相似文献   

2.
蛋白质是生物体的重要组成部分并参与细胞内几乎所有的生物学过程.随着越来越多物种基因组序列的测定,准确理解基因产物的功能并探索蛋白质功能多样性的原因,已经成为当前的研究热点.为了研究蛋白质的功能,已有大量蛋白质的静态三维结构被测定.但是,蛋白功能最终受其动力学行为所控制,这包括折叠过程、构象波动、分子运动以及蛋白质-配体相互作用等.基于自由能图谱理论,本文深入讨论了蛋白质动力学的底层物理化学机制,并回答了以下问题:蛋白质为什么能够折叠、以及如何折叠成其天然三维结构?为什么蛋白质的动力学特征是固有的?其动力学行为如何控制蛋白质的功能?讨论结果将有助于后基因组时代生命科学研究中蛋白质结构-功能关系的理解.  相似文献   

3.
理论和实验研究表明,蛋白质天然拓扑结构对其折叠过程具有重要的影响.采用复杂网络的方法分析蛋白质天然结构的拓扑特征,并探索蛋白质结构特征与折叠速率之间的内在联系.分别构建了蛋白质氨基酸网络、疏水网、亲水网、亲水-疏水网以及相应的长程网络,研究了这些网络的匹配系数(assortativity coefficient)和聚集系数(clustering coefficient)的统计特性.结果表明,除了亲水-疏水网,上述各网络的匹配系数均为正值,并且氨基酸网和疏水网的匹配系数与折叠速率表现出明显的线性正相关,揭示了疏水残基间相互作用的协同性有助于蛋白质的快速折叠.同时,研究发现疏水网的聚集系数与折叠速率有明显的线性负相关关系,这表明疏水残基间三角结构(triangle construction)的形成不利于蛋白质快速折叠.还进一步构建了相应的长程网络,发现序列上间距较远的残基接触对的形成将使蛋白质折叠进程变慢.  相似文献   

4.
把蛋白质折叠看成多肽链上扭转态间的量子跃迁, 依据构象动力学的量子理论, 提出用接触残基间多肽链转动惯量和扭转势能来表征接触特性的动力学接触序, 从而能定量地从动力学角度研究蛋白质折叠速率. 在80个蛋白的数据集上实验, 证实了构象量子跃迁观点的合理性并得到以下结论: (1) 折叠速率与接触转动惯量之间存在显著相关性; (2) 多态蛋白的折叠可以看成在同样转动惯量、温度等条件下的二态蛋白折叠基础上的中间态延迟, 并估计了延迟时间的数量级; (3) 折叠可以分为释能和吸能两类, 蛋白质折叠速率上限由释能折叠决定, 并导出大多数折叠速率大的二态蛋白的量子跃迁过程为释能反应, 而折叠速率小的多态蛋白为吸能反应.  相似文献   

5.
蛋白质折叠类型分类方法及分类数据库   总被引:1,自引:0,他引:1  
李晓琴  仁文科  刘岳  徐海松  乔辉 《生物信息学》2010,8(3):245-247,253
蛋白质折叠规律研究是生命科学重大前沿课题,折叠分类是蛋白质折叠研究的基础。目前的蛋白质折叠类型分类基本上靠专家完成,不同的库分类并不相同,迫切需要一个建立在统一原理基础上的蛋白质折叠类型数据库。本文以ASTRAL-1.65数据库中序列同源性在25%以下、分辨率小于2.5的蛋白为基础,通过对蛋白质空间结构的观察及折叠类型特征的分析,提出以蛋白质折叠核心为中心、以蛋白质结构拓扑不变性为原则、以蛋白质折叠核心的规则结构片段组成、连接和空间排布为依据的蛋白质折叠类型分类方法,建立了低相似度蛋白质折叠分类数据库——LIFCA,包含259种蛋白质折叠类型。数据库的建立,将为进一步的蛋白质折叠建模及数据挖掘、蛋白质折叠识别、蛋白质折叠结构进化研究奠定基础。  相似文献   

6.
不少中学生物学教师对β-折叠划为蛋白质二级结构提出疑问,疑问的主要点是:蛋白质的二级结构是指一条多肽链的折叠盘绕方式,而β-折叠(不论是平行的或是反平行的)都是两条以上肽链折叠盘绕而成,一条与两条以上显然是不相等的。而所有研究蛋白质的论著都毫无置疑地把β-折叠列为二级结构,这应该怎样理解呢?本文试浅析如下: 二级结构的概念和特征蛋白质的分子量很大(约六千到一百万之间)。结构十分复杂。为了研究的方便,1952年Linderstrφm—Lang首先将蛋白质的结构划  相似文献   

7.
建立了一个包含核酸序列信息的蛋白质折叠数据库。以此为基础,对于每一个蛋白质,计算了其相应编码mRNA序列的茎结构含量、环结构含量、折叠自由能及mRNA的柔性等描述mRNA二级结构特征的基本参量。进一步分析了这些mRNA二级结构参量与相应蛋白质折叠速率的关系。结果表明,mRNA茎结构含量与蛋白质折叠速率呈显著负相关性,而环结构含量则与蛋白质折叠速率呈显著正相关性;同时,mRNA的柔性与相应蛋白质折叠速率呈极显著正相关性。进一步的分析表明,当把蛋白质分为不同二级结构类型和折叠类型后,mRNA的柔性对不同类型蛋白质的折叠速率均为重要的影响因素,而mRNA的茎结构含量和环结构含量主要影响二态蛋白质的折叠。结果证实,mRNA的二级结构对蛋白质的折叠有着重要作用。  相似文献   

8.
从氨基酸序列预测蛋白质折叠速率   总被引:1,自引:0,他引:1  
蛋白质折叠速率预测是当今生物物理学最具挑战性的课题之一.近年来,许多科研工作者开展了大量的研究工作来探索折叠速率的决定因素,许多参数和方法被相继提出.但氨基酸残基间的相互作用、氨基酸的序列顺序等信息对折叠速率的影响从未被提及.采用伪氨基酸组成的方法提取氨基酸的序列顺序信息,利用蒙特卡洛方法选择最佳特征因子,建立线性回归模型进行折叠速率预测.该方法能在不需要任何(显示)结构信息的情况下,直接从蛋白质的氨基酸序列出发对折叠速率进行预测.在Jackknife交互检验方法的验证下,对含有99个蛋白质的数据集,发现折叠速率的预测值与实验值有很好的相关性,相关系数能达到0.81,预测误差仅为2.54.这一精度明显优于其他基于序列的方法,充分说明蛋白质的序列顺序信息是影响蛋白质折叠速率的重要因素.  相似文献   

9.
蛋白质折叠模式识别是一种分析蛋白质结构的重要方法。以序列相似性较低的蛋白质为训练集,提取蛋白质序列信息频数及疏水性等信息作为折叠类型特征,从SCOP数据库中已分类蛋白质构建1 393种折叠模式的数据集,采用SVM预测蛋白质1 393种折叠模式。封闭测试准确率达99.612 2%,基于SCOP的开放测试准确率达79.632 9%。基于另一个权威测试集的开放测试折叠准确率达64.705 9%,SCOP类准确率达76.470 6%,可以有效地对蛋白质折叠模式进行预测,从而为蛋白质从头预测提供参考。  相似文献   

10.
使用图像特征构建快速有效的蛋白质折叠识别方法   总被引:2,自引:0,他引:2  
蛋白质结构自动分类是探索蛋白质结构- 功能关系的一种重要研究手段。首先将蛋白质折叠子三维空间结构映射成为二维距离矩阵,并将距离矩阵视作灰度图像。然后基于灰度直方图和灰度共生矩阵提出了一种计算简单的折叠子结构特征提取方法,得到了低维且能够反映折叠结构特点的特征,并进一步阐明了直方图中零灰度孤峰形成原因,深入分析了共生矩阵特征中灰度分布、不同角度和像素距离对应的结构意义。最后应用于27类折叠子分类,对独立集测试的精度达到了71.95 %,对所有数据进行10 交叉验证的精度为78.94 %。与多个基于序列和结构的折叠识别方法的对比结果表明,此方法不仅具有低维和简洁的特征,而且无需复杂的分类系统,能够有效和高效地实现多类折叠子识别。  相似文献   

11.
A key driving force in determination of protein structural classes.   总被引:13,自引:0,他引:13  
The three-dimensional structure of a protein is uniquely dictated by its primary sequence. However, owing to the very high degenerative nature of the sequence-structure relationship, proteins are generally folded into one of only a few structural classes that are closely correlated with the amino-acid composition. This suggests that the interaction among the components of amino acid composition may play a considerable role in determining the structural class of a protein. To quantitatively test such a hypothesis at a deeper level, three potential functions, U((0)), U((1)), and U((2)), were formulated that respectively represent the 0th-order, 1st-order, and 2nd-order approximations for the interaction among the components of the amino acid composition in a protein. It was observed that the correct rates in recognizing protein structural classes by U((2)) are significantly higher than those by U((0)) and U((1)), indicating that an algorithm that can more completely incorporate the interaction contributions will yield better recognition quality, and hence further demonstrate that the interaction among the components of amino acid composition is an important driving force in determining the structural class of a protein during the sequence folding process.  相似文献   

12.
Gu W  Zhou T  Ma J  Sun X  Lu Z 《Bio Systems》2004,73(2):89-97
The role of silent position in the codon on the protein structure is an interesting and yet unclear problem. In this paper, 563 Homo sapiens genes and 417 Escherichia coli genes coding for proteins with four different folding types have been analyzed using variance analysis, a multivariate analysis method newly used in codon usage analysis, to find the correlation between amino acid composition, synonymous codon, and protein structure in different organisms. It has been found that in E. coli, both amino acid compositions in differently folded proteins and synonymous codon usage in different gene classes coding for differently folded proteins are significantly different. It was also found that only amino acid composition is different in different protein classes in H. sapiens. There is no universal correlation between synonymous codon usage and protein structure in these two different organisms. Further analysis has shown that GC content on the second codon position can distinguish coding genes for different folded proteins in both organisms.  相似文献   

13.
Here, we present statistical analysis of conservation profiles in families of homologous sequences for nine proteins whose folding nucleus was determined by protein engineering methods. We show that in all but one protein (AcP) folding nucleus residues are significantly more conserved than the rest of the protein. Two aspects of our study are especially important: (i) grouping of amino acid residues into classes according to their physical-chemical properties and (ii) proper normalization of amino acid probabilities that reflects the fact that evolutionary pressure to conserve some amino acid types may itself affect concentration of various amino acid types in protein families. Neglect of any of those two factors may make physical and biological "signals" from conservation profiles disappear.  相似文献   

14.
Folding of polypeptide chains induced by the amino acid side-chains   总被引:5,自引:0,他引:5  
Conformational calculations with the use of semi-empirical potential functions have been applied to the analysis of the folding of peptide chains. In particular, the part played by the amino acid side-chains in the adoption of folded conformations has been investigated.The results show that the preferred conformations of short peptides are mostly extended ones. However, from a given peptide chain-length, the side-chain to backbone and side-chain to side-chain interactions become strong enough so that definite sequences of amino acids can induce a transition from extended to folded conformations. We propose to call these folded structures “conformational nuclei”. The type of “nucleus” formed is dependent on both the amino acid composition and the sequence.Our results strongly support the hypothesis that folding of polypeptide chains can occur through a nucleation process that could be induced by the side-chains.  相似文献   

15.
Conotoxins are short, disulfide-rich peptide neurotoxins produced in the venom of predatory marine cone snails. It is generally accepted that an estimated 100,000 unique conotoxins fall into only a handful of structural groups, based on their disulfide bridging frameworks. This unique molecular diversity poses a protein folding problem of relationships between hypervariability of amino acid sequences and mechanism(s) of oxidative folding. In this study, we present a comparative analysis of the folding properties of four conotoxins sharing an identical pattern of cysteine residues forming three disulfide bridges, but otherwise differing significantly in their primary amino acid sequence. Oxidative folding properties of M-superfamily conotoxins GIIIA, PIIIA, SmIIIA and RIIIK varied with respect to kinetics and thermodynamics. Based on rates for establishing the steady-state distribution of the folding species, two distinct folding mechanisms could be distinguished: first, rapid-collapse folding characterized by very fast, but low-yield accumulation of the correctly folded form; and second, slow-rearrangement folding resulting in higher accumulation of the properly folded form via the reshuffling of disulfide bonds within folding intermediates. Effects of changing the folding conditions indicated that the rapid-collapse and the slow-rearrangement mechanisms were mainly determined by either repulsive electrostatic or productive noncovalent interactions, respectively. The differences in folding kinetics for these two mechanisms were minimized in the presence of protein disulfide isomerase. Taken together, folding properties of conotoxins from the M-superfamily presented in this work and from the O-superfamily published previously suggest that conotoxin sequence diversity is also reflected in their folding properties, and that sequence information rather than a cysteine pattern determines the in vitro folding mechanisms of conotoxins.  相似文献   

16.
Isogai Y 《Biochemistry》2006,45(8):2488-2492
Hydrophobic core mutants of sperm whale apomyoglobin were constructed to investigate the amino acid sequence features that determine the folding properties. Replacements of all of the Ile residues with Leu and of all of the Ile and Val residues with Leu decreased the thermodynamic stability of the folded states against the unfolded states but increased the stability of the folding intermediates against the unfolded states, indicating that the amino acid composition of the protein core is important for the protein stability and folding cooperativity. To examine the effect of the arrangement of these hydrophobic residues, mutant proteins were further constructed: 12 sites out of the 18 Leu, 9 Ile, and 8 Val residues of the wild-type myoglobin were randomly replaced with each other so that the amino acid compositions were similar to that of the wild-type protein. Four mutant proteins were obtained without selection of the protein properties. These residue replacements similarly resulted in the stabilization of both the intermediate and folded states against the unfolded states, as compared to the wild-type protein. Thus, the arrangements of the hydrophobic residues in the native amino acid sequence are selected to destabilize the folding intermediate rather than to stabilize the folded state. The present results suggest that the two-state transition of protein folding or the transient formation of the unstable intermediate, which seems to be required for effective production of the functional proteins, has been a major driving force in the molecular evolution of natural globular proteins.  相似文献   

17.
If it is assumed that the primary sequence determines the three-dimensional folded structure of a protein, then the regular folding patterns, such as alpha-helix, beta-sheet, and other ordered patterns in the three-dimensional structure must correspond to the periodic distribution of the physical properties of the amino acids along the primary sequence. An AutoRegressive Moving Average (ARMA) model method of spectral analysis is applied to analyze protein sequences represented by the hydrophobicity of their amino acids. The results for several membrane proteins of known structures indicate that the periodic distribution of hydrophobicity of the primary sequence is closely related to the regular folding patterns in a protein's three-dimensional structure. We also applied the method to the transmembrane regions of acetylcholine receptor alpha subunit and Shaker potassium channel for which no atomic resolution structure is available. This work is an extension of our analysis of globular proteins by a similar method.  相似文献   

18.
An empirical relation between the amino acid composition and three-dimensional folding pattern of several classes of proteins has been determined. Computer simulated neural networks have been used to assign proteins to one of the following classes based on their amino acid composition and size: (1) 4α-helical bundles, (2) parallel (α/β)8 barrels, (3) nucleotide binding fold, (4) immunoglobulin fold, or (5) none of these. Networks trained on the known crystal structures as well as sequences of closely related proteins are shown to correctly predict folding classes of proteins not represented in the training set with an average accuracy of 87%. Other folding motifs can easily be added to the prediction scheme once larger databases become available. Analysis of the neural network weights reveals that amino acids favoring prediction of a folding class are usually over represented in that class and amino acids with unfavorable weights are underrepresented in composition. The neural networks utilize combinations of these multiple small variations in amino acid composition in order to make a prediction. The favorably weighted amino acids in a given class also form the most intramolecular interactions with other residues in proteins of that class. A detailed examination of the contacts of these amino acids reveals some general patterns that may help stabilize each folding class. © 1993 Wiley-Liss, Inc.  相似文献   

19.
Prediction of protein cellular attributes using pseudo-amino acid composition   总被引:28,自引:0,他引:28  
Chou KC 《Proteins》2001,43(3):246-255
The cellular attributes of a protein, such as which compartment of a cell it belongs to and how it is associated with the lipid bilayer of an organelle, are closely correlated with its biological functions. The success of human genome project and the rapid increase in the number of protein sequences entering into data bank have stimulated a challenging frontier: How to develop a fast and accurate method to predict the cellular attributes of a protein based on its amino acid sequence? The existing algorithms for predicting these attributes were all based on the amino acid composition in which no sequence order effect was taken into account. To improve the prediction quality, it is necessary to incorporate such an effect. However, the number of possible patterns for protein sequences is extremely large, which has posed a formidable difficulty for realizing this goal. To deal with such a difficulty, the pseudo‐amino acid composition is introduced. It is a combination of a set of discrete sequence correlation factors and the 20 components of the conventional amino acid composition. A remarkable improvement in prediction quality has been observed by using the pseudo‐amino acid composition. The success rates of prediction thus obtained are so far the highest for the same classification schemes and same data sets. It has not escaped from our notice that the concept of pseudo‐amino acid composition as well as its mathematical framework and biochemical implication may also have a notable impact on improving the prediction quality of other protein features. Proteins 2001;43:246–255. © 2001 Wiley‐Liss, Inc.  相似文献   

20.

Background  

Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号