首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 343 毫秒
1.
核酸序列中包含一定的蛋白质结构信息。根据通常情况下遗传密码表中密码子中间位的碱基配对时产生的氢键数目,尝试将20种氨基酸划分为两类,并用自编的计算机软件对蛋白质二级结构数据库中两类氨基酸的类聚现象进行了统计分析。结果表明,使用这种方法对氨基酸进行划分后,氨基酸残基具有较大概率与划入同一类的氨基酸残基相邻出现,并且这种聚集体对二级结构具有一定的偏好性。最后按照该方法设计了一段氨基酸序列并给出了预测服务器预测得到的结构。  相似文献   

2.
用离散量的方法识别蛋白质的超二级结构   总被引:1,自引:0,他引:1  
用离散量的方法,对2208个分辨率在2.5I以上的高精度的蛋白质结构中四类超二级结构进行了识别。从蛋白质一级序列出发,以氨基酸(20种氨基酸加一个空位)和其紧邻关联共同为参数,当序列模式固定长取8个氨基酸残基时,对“822”序列模式3交叉检验的平均预测精度达到78.1%,jack-knife检验的平均预测精度达到76.7%;当序列模式固定长取10个氨基酸残基时,对“1041”序列模式3交叉检验的平均预测精度达到83.1%,jack-knife检验的平均预测精度达到79.8%。  相似文献   

3.
氨基酸组成聚类、蛋白质结构型和结构型的预测   总被引:11,自引:0,他引:11  
用信息聚类方法对蛋白质的氨基酸组成进行聚类,发现存在梯级成团(大集团分解成小集团)现象,645个蛋白质可分成15个小集团,每一个小集团与蛋白质二级结构含量决定的结构型有一定相关性,但与蛋白质五大结构型相关性不明显。指出了由氨基酸成分和二级结构含量预测结构型的方案中存在的问题。提出了由蛋白质二级结构序列预测蛋白质结构型的新方法,并给出了预测蛋白质结构型的简明预测规则  相似文献   

4.
通过研究神经网络权值矩阵的算法,挖掘蛋白质二级结构与氨基酸序列间的内在规律,提高一级序列预测二级结构的准确度。神经网络方法在特征分类方面具有良好表现,经过学习训练后的神经元连接权值矩阵包含样本的内在特征和规律。研究使用神经网络权值矩阵打分预测;采用错位比对方法寻找敏感的氨基酸邻域;分析测试集在不同加窗长度下的共性表现。实验表明,在滑动窗口长度L=7时,预测性能变化显著;邻域位置P=4的氨基酸残基对预测性能有加强作用。该研究方法为基于局部序列特征的蛋白质二级结构预测提供了新的算法设计。  相似文献   

5.
提出了一种新的蛋白质二级结构预测方法. 该方法从氨基酸序列中提取出和自然语言中的“词”类似的与物种相关的蛋白质二级结构词条, 这些词条形成了蛋白质二级结构词典, 该词典描述了氨基酸序列和蛋白质二级结构之间的关系. 预测蛋白质二级结构的过程和自然语言中的分词和词性标注一体化的过程类似. 该方法把词条序列看成是马尔科夫链, 通过Viterbi算法搜索每个词条被标注为某种二级结构类型的最大概率, 其中使用词网格描述分词的结果, 使用最大熵马尔科夫模型计算词条的二级结构概率. 蛋白质二级结构预测的结果是最优的分词所对应的二级结构类型. 在4个物种的蛋白质序列上对这种方法进行测试, 并和PHD方法进行比较. 试验结果显示, 这种方法的Q3准确率比PHD方法高3.9%, SOV准确率比PHD方法高4.6%. 结合BLAST搜索的局部相似的序列可以进一步提高预测的准确率. 在50个CASP5目标蛋白质序列上进行测试的结果是: Q3准确率为78.9%, SOV准确率为77.1%. 基于这种方法建立了一个蛋白质二级结构预测的服务器, 可以通过http://www.insun.hit.edu.cn:81/demos/biology/index.html来访问.  相似文献   

6.
淀粉酶的同源性研究   总被引:2,自引:0,他引:2  
对α-淀粉酶、麦芽四糖淀粉酶和葡萄糖淀粉酶进行的氨基酸序列比较表明,它们之间的氨基酸等同性相当低。本文在对这三种淀粉酶的氨基酸残基进行疏水性分析的基础上,采用国际上近期发展的疏水簇分析方法对这三种淀粉酶的氨基酸序列进行了二维描述,其结果清楚地显示了它们之间的同源性差异。这也被测定的三维结构所证实。这一研究结果表明,与蛋白质三维结构密切相关的蛋白质的二级结构及其相互配置主要取决于蛋白质序列中不同类型氨基酸残基的排列。  相似文献   

7.
蛋白质折叠速率的正确预测对理解蛋白质的折叠机理非常重要。本文从伪氨基酸组成的方法出发,提出利用序列疏水值震荡的方法来提取蛋白质氨基酸的序列顺序信息,建立线性回归模型进行折叠速率预测。该方法不需要蛋白质的任何二级结构、三级结构信息或结构类信息,可直接从序列对蛋白质折叠速率进行预测。对含有62个蛋白质的数据集,经过Jack.knife交互检验验证,相关系数达到0.804,表示折叠速率预测值与实验值有很好的相关性,说明了氨基酸序列信息对蛋白质折叠速率影响重要。同其他方法相比,本文的方法具有计算简单,输入参数少等特点。  相似文献   

8.
蛋白质分子的一切高级结构,都由一级结构即氨基酸残基序列所包含的信息决定。多年来,由蛋白质的氨基酸序列预测二级结构的方法不下十几种。其中,Chou和Fasman的方法自1974年提出,至1978年修正、精化,已得到了很好结果,越益受到重视。此方法的突出优点是简便,无须计算机的复杂分析,就可预测出蛋白质的二级结构,准确性约为80%。目前蛋白质二级结构的测定,当然以X-晶体衍射结果最准确。Chou和Fasman方法正是基于晶体分析的结果,经统计得出的一整套数据  相似文献   

9.
蛋白质结构的预测在理解蛋白质结构组成和蛋白质的生物学功能有重要意义,而蛋白质二级结构预测是蛋白质结构预测的重要环节。当PSSM位置特异性进化矩阵被广泛应用于将蛋白质初级结构序列编码作为输入样本后,每个残基可以被表示成二维空间的数据平面,由此文中尝试利用卷积神经网络对其进行训练。文中还设计了另一种卷积神经网络,利用长短记忆网络感知了CNN最后卷积特征面的横向特征和纵向特征后连同卷积神经网络的全连接共同完成分类,最后用ensemble方法对两类卷积神经网络模型进行了整合,最终ensemble方法中包含两类卷积神经网络的六个模型,在CB513蛋白质数据集测得的Q3结果为77.2。  相似文献   

10.
蛋白质的序列、结构和功能多种多样.大量研究表明蛋白质的结构与其氨基酸序列的排序有关,并且局部的氨基酸序列环境对蛋白质的结构具有一定的影响.本文提出一种新的基于5-mer氨基酸扭转角统计偏好的蛋白质结构类型预测方法,在该方法通过PDB数据库中5-mer中间氨基酸的扭转角统计偏好来进行结构类型的预测.新方法可以通过计算机仿...  相似文献   

11.
We present a new method, secondary structure prediction by deviation parameter (SSPDP) for predicting the secondary structure of proteins from amino acid sequence. Deviation parameters (DP) for amino acid singlets, doublets and triplets were computed with respect to secondary structural elements of proteins based on the dictionary of secondary structure prediction (DSSP)-generated secondary structure for 408 selected non-homologous proteins. To the amino acid triplets which are not found in the selected dataset, a DP value of zero is assigned with respect to the secondary structural elements of proteins. The total number of parameters generated is 15,432, in the possible parameters of 25,260. Deviation parameter is complete with respect to amino acid singlets, doublets, and partially complete with respect to amino acid triplets. These generated parameters were used to predict secondary structural elements from amino acid sequence. The secondary structure predicted by our method (SSPDP) was compared with that of single sequence (NNPREDICT) and multiple sequence (PHD) methods. The average value of the percentage of prediction accuracy for a helix by SSPDP, NNPREDICT and PHD methods was found to be 57%, 44% and 69% respectively for the proteins in the selected dataset. For b-strand the prediction accuracy is found to be 69%, 21% and 53% respectively by SSPDP, NNPREDICT and PHD methods. This clearly indicates that the secondary structure prediction by our method is as good as PHD method but much better than NNPREDICT method.  相似文献   

12.
The amino acid sequence of the ubiquinone binding protein (QP-C) in the cytochrome bc1 region of the mitochondrial electron transfer chain was determined by analysis of peptides obtained by cyanogen bromide cleavage and staphylococcal protease digestion of succinylated derivatives. It was found to consist of 110 amino acid residues and its amino terminus to be blocked by an acetyl group, as determined by mass spectrometry of the amino-terminal peptide and a comparison with peptides chemically synthesized on high-performance liquid chromatography. The molecular weight of this ubiquinone binding protein including the acetyl group was calculated to be 13,389. The predicted secondary structure of QP-C has alpha-helical content of about 50% and QP-C was classified as an "all-alpha" or "alpha + beta" protein. This is the first report describing the amino acid sequence of the ubiquinone binding protein. A comparison of this sequence with that of the 14-kDa subunit of the yeast ubiquinol-cytochrome c reductase complex from the nucleotide sequence showed these two sequences to be quite similar.  相似文献   

13.
We present a new method, secondary structure prediction by deviation parameter (SSPDP) for predicting the secondary structure of proteins from amino acid sequence. Deviation parameters (DP) for amino acid singlets, doublets and triplets were computed with respect to secondary structural elements of proteins based on the dictionary of secondary structure prediction (DSSP)-generated secondary structure for 408 selected nonhomologous proteins. To the amino acid triplets which are not found in the selected dataset, a DP value of zero is assigned with respect to the secondary structural elements of proteins. The total number of parameters generated is 15,432, in the possible parameters of 25,260. Deviation parameter is complete with respect to amino acid singlets, doublets, and partially complete with respect to amino acid triplets. These generated parameters were used to predict secondary structural elements from amino acid sequence. The secondary structure predicted by our method (SSPDP) was compared with that of single sequence (NNPREDICT) and multiple sequence (PHD) methods. The average value of the percentage of prediction accuracy for αhelix by SSPDP, NNPREDICT and PHD methods was found to be 57%, 44% and 69% respectively for the proteins in the selected dataset. For Β-strand the prediction accuracy is found to be 69%, 21% and 53% respectively by SSPDP, NNPREDICT and PHD methods. This clearly indicates that the secondary structure prediction by our method is as good as PHD method but much better than NNPREDICT method.  相似文献   

14.
15.
The complete amino acid sequence (186 amino acid residues) of a basic cytosolic protein from bovine brain has been determined. It was previously described as a phosphatidylethanolamine binding protein. Computer analyses have been used to calculate its hydropathy profile and to predict its secondary structure. Comparison with other proteins did not detect any significant sequence similarity, except for a short region which presents 53% sequence homology with bovine phosphatidylcholine transfer protein.  相似文献   

16.
The cDNA sequence coding for the coat protein of cucumber mosaic virus (Japanese Y strain) was cloned, and its nucleotide sequence was determined. The sequence contains an open reading frame that encodes the coat protein composed of 218 amino acids. The nucleotide and deduced amino acid sequences of the coat protein of this strain were compared with those of the Q strain; the homologies of the sequences were 78% and 81%, respectively. Further study of the sequences gave an insight into the genome organization and the molecular features of the coat protein. The coding region can be divided into three characteristic regions. The N-terminal region has conserved features in the positively charged structure, the hydropathy pattern and the predicted secondary structure, although the amino acid sequence is varied mainly due to frameshift mutations. It is noteworthy that the positions of arginine residues in this region are highly conserved. Both the nucleotide and amino acid sequences of the central region are well conserved. The amino acid sequence of the C-terminal region is not conserved, because of frameshift mutations, however, the total number of amino acids is conserved. The nucleotide sequence of the 3'-noncoding region is divergent, but it could form a tRNA-like structure similar to those reported for other viruses. Detailed investigation suggests that the Y and Q strains are evolutionarily distant.  相似文献   

17.
Lee S  Lee BC  Kim D 《Proteins》2006,62(4):1107-1114
Knowing protein structure and inferring its function from the structure are one of the main issues of computational structural biology, and often the first step is studying protein secondary structure. There have been many attempts to predict protein secondary structure contents. Previous attempts assumed that the content of protein secondary structure can be predicted successfully using the information on the amino acid composition of a protein. Recent methods achieved remarkable prediction accuracy by using the expanded composition information. The overall average error of the most successful method is 3.4%. Here, we demonstrate that even if we only use the simple amino acid composition information alone, it is possible to improve the prediction accuracy significantly if the evolutionary information is included. The idea is motivated by the observation that evolutionarily related proteins share the similar structure. After calculating the homolog-averaged amino acid composition of a protein, which can be easily obtained from the multiple sequence alignment by running PSI-BLAST, those 20 numbers are learned by a multiple linear regression, an artificial neural network and a support vector regression. The overall average error of method by a support vector regression is 3.3%. It is remarkable that we obtain the comparable accuracy without utilizing the expanded composition information such as pair-coupled amino acid composition. This work again demonstrates that the amino acid composition is a fundamental characteristic of a protein. It is anticipated that our novel idea can be applied to many areas of protein bioinformatics where the amino acid composition information is utilized, such as subcellular localization prediction, enzyme subclass prediction, domain boundary prediction, signal sequence prediction, and prediction of unfolded segment in a protein sequence, to name a few.  相似文献   

18.
Huang JT  Xing DJ  Huang W 《Amino acids》2012,43(2):567-572
The successful prediction of protein-folding rates based on the sequence-predicted secondary structure suggests that the folding rates might be predicted from sequence alone. To pursue this question, we directly predict the folding rates from amino acid sequences, which do not require any information on secondary or tertiary structure. Our work achieves 88% correlation with folding rates determined experimentally for proteins of all folding types and peptide, suggesting that almost all of the information needed to specify a protein's folding kinetics and mechanism is comprised within its amino acid sequence. The influence of residue on folding rate is related to amino acid properties. Hydrophobic character of amino acids may be an important determinant of folding kinetics, whereas other properties, size, flexibility, polarity and isoelectric point, of amino acids have contributed little to the folding rate constant.  相似文献   

19.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号