首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 171 毫秒
1.
从非同源蛋白质的一级序列预测其结构类   总被引:8,自引:1,他引:7  
对基于氨基酸组成、自相关函数和自协方差函数提取特征的蛋白质结构类预测算法进行分析比较,对氨基酸组成和自相关函数相结合的方法,以及氨基酸组成和自协放差函数相结合的方法的预测算法进行了研究。结果表明:对非同源蛋白质,因氨基酸和自相关函数相结合的方法中,采用Miyazawa和Jernigan的疏水值时,训练的自检验的总精度为95.34%,其Jackknife检验的总精度为81.92%,检验加的他检验的总精工为86.61%。在氨基酸组成和自协方差函数相结合的方法中,采用Wold等的疏水值时,训练库的自检验的总精度为96.71%,其Jackknife检验的总精度为82.18%,检验加的他检验的总精工为86.88%。这说明氨基酸组成和自相关函数相结合的方法,以及氨基酸组成和自协方差函数相结合的方法可有效提高结构类预测精度,表明提取更多有效的序列信息是提高分类精度的关键。  相似文献   

2.
从蛋白质折叠成自由能最小的稳定结构类型为研究的出发点,为揭示蛋白质空间折叠的动力学本质,对非同源蛋白质数据库,以蛋白质序列的氮基酸频率和自协方差函数为特征矢量,求出表征特征矢量中各分量耦合作用与协同作用的协方差矩阵所对应的特征值.与Chou的方法相比,更全面地反映了蛋白质折叠密码的简并性、全局性和多意性,为定量表征折叠成不同结构类的蛋白质,提供了一种动力学参数分析方法.  相似文献   

3.
蛋白质折叠速率的正确预测对理解蛋白质的折叠机理非常重要。本文从伪氨基酸组成的方法出发,提出利用序列疏水值震荡的方法来提取蛋白质氨基酸的序列顺序信息,建立线性回归模型进行折叠速率预测。该方法不需要蛋白质的任何二级结构、三级结构信息或结构类信息,可直接从序列对蛋白质折叠速率进行预测。对含有62个蛋白质的数据集,经过Jack.knife交互检验验证,相关系数达到0.804,表示折叠速率预测值与实验值有很好的相关性,说明了氨基酸序列信息对蛋白质折叠速率影响重要。同其他方法相比,本文的方法具有计算简单,输入参数少等特点。  相似文献   

4.
蛋白质折叠类型识别方法研究   总被引:1,自引:0,他引:1  
蛋白质折叠类型识别是一种分析蛋白质结构的重要方法.以序列相似性低于25%的822个全B类蛋白为研究对象,提取核心结构二级结构片段及片段问氢键作用信息为折叠类型特征参数,构建全B类蛋白74种折叠类型模板数据库.定义查询蛋白与折叠类型模板间二级结构匹配函数SS、氢键作用势函数BP及打分函数P,P值最小的模板所对应的折叠类型为查询蛋白的折叠类型.从SCOP1.69中随机抽取三组、每组50个全β类蛋白结构域进行预测,分辨精度分别为56%、56%和42%;对Ding等提供的检验集进行预测,总分辨精度为61.5%.结果和比对表明,此方法是一种有效的折叠类型识别方法.  相似文献   

5.
目的:研究猪β2m的结构.方法:应用EXPASY服务器(http://www.expasy.ch/tools)上的SOPMA法对β2m进行二级结构的预测,并与人的β2m的氨基酸和二级结构成分比较,在此基础上同源模建β2m的三级结构.结果:β2m二级结构成分α螺旋、β折叠、转角和无规则卷曲的数目分别为5、36、5和52,与人β2m相应二级结构的符合率分别达到100%(α螺旋)、92.3%(B折叠)、71.7%(转角)和92.3%(无规则卷曲),且二级结构的同源率要大于氨基酸的同源率.同源模建分析表明猪和人的3D结构也非常相似,只是存在个别氨基酸的差异.结论:猪和人β2m空间结构相似.  相似文献   

6.
以序列相似性低于40%的1895条蛋白质序列构建涵盖27个折叠类型的蛋白质折叠子数据库,从蛋白质序列出发,用模体频数值、低频功率谱密度值、氨基酸组分、预测的二级结构信息和自相关函数值构成组合向量表示蛋白质序列信息,采用支持向量机算法,基于整体分类策略,对27类蛋白质折叠子的折叠类型进行预测,独立检验的预测精度达到了66.67%。同时,以同样的特征参数和算法对27类折叠子的4个结构类型进行了预测,独立检验的预测精度达到了89.24%。将同样的方法用于前人使用过的27类折叠子数据库,得到了好于前人的预测结果。  相似文献   

7.
α/β类蛋白质折叠类型的分类方法研究   总被引:1,自引:0,他引:1       下载免费PDF全文
马帅  王勤  李晓琴 《生物信息学》2014,12(2):123-132
蛋白质折叠规律的研究是生命科学重大前沿课题之一,折叠分类是蛋白质折叠研究的基础。本文基于LIFCA数据库,选取样本量大于2的55种α/β类蛋白质折叠类型为研究对象。结合蛋白质折叠类型的定义及其保守拓扑结构特征,确定了55种蛋白质折叠类型的模板及其对应的特征参数。建立了基于模板的打分函数Mul-Fscore,并结合二级结构参数信息,给出了55种α/β类蛋白质折叠类型的多模板分类方法。用此方法对LIFAC数据库中的931个样本进行检验,分类结果的平均特异性、平均敏感性、MCC值分别为99.58%、79.47%、79.39%。与TM-score分类结果对比发现,Mul-Fscore分类的敏感性与MCC值好于TM-score的相应结果,平均特异性相近。  相似文献   

8.
目前,有关同义密码子使用偏性对蛋白质折叠的影响研究中,样本蛋白均来源于不同的物种。考虑到同义密码子使用偏性的物种差异性,选取枯草杆菌的核蛋白为研究对象。首先,将每条核蛋白按二级结构截取为α螺旋片段、β折叠片段和无规卷曲(α-β混合)片段,并计算其蛋白质折叠速率。然后,整理每个片段相应的核酸序列信息,计算其同义密码子使用度。在此基础上,分析枯草芽孢杆菌核蛋白的同义密码子使用偏性与蛋白质折叠速率的相关性。发现对于不同二级结构的肽链片段,都有部分密码子的使用偏性与其对应的肽链折叠速率显著相关。进一步分析发现,与肽链片段折叠速率显著相关的密码子绝大部分为枯草杆菌全序列或核蛋白序列的每一组同义密码子中使用度最高的密码子。结果表明,在蛋白质的折叠过程中,枯草芽孢杆菌的同义密码子使用偏性起着重要作用。  相似文献   

9.
以A.niger来源的果胶酶为材料,经过CM-SephadexC-50及SephadexG-100两步骤分离纯化得到电泳均一的endo-PG1及endo-PG2,其亚基分子量分别为35kD及37kD,含糖量为11.22%及8.3%,最大紫外吸收峰分别在274nm及269nm处,氨基酸组成分析结果表明Gly含量较高,Met含量较低,不含Cys,并且酸性氨基酸含量高于碱性氨基酸,圆二色谱结果表明二级结构主要为α螺旋和β折叠,其中endo-PG1含α螺旋45.1%,β折叠24.9%;endo-PG2含α螺旋39.6%,β折叠36.5%。  相似文献   

10.
采用主成分分析法对样本数据集进行预处理,将得到的新样本数据集输入支持向量机,籍均匀设计,构建了几丁质酶氨基酸组成和最适pH的数学模型。当惩罚系数C为10,epsilon值为0.7,Gamma值为0.5,模型对pH值拟合的平均绝对百分比误差为3.76%,同时具有良好的预测效果,预测的平均绝对误差为0.42个pH单位。该方法比用BP神经网络方法效果更佳。  相似文献   

11.
An improved multiple linear regression method has been proposed to predict the content of alpha-helix and beta-strand of a globular protein based on its primary sequence and structural class. The amino acid composition and the auto-correlation functions derived from the hydrophobicity profile of the primary sequence have been taken into account. However, only the compositions of a part of the amino acids and a part of the auto-correlation functions are selected as the regression terms, which lead to the least prediction error. The resubstitution test shows that the average absolute errors are 0.052 and 0.047 with the standard deviations 0.050 and 0.047 for the prediction of helix/strand content, respectively. A rigorous cross-validation test, the jackknife test shows that the average absolute errors are 0.058 and 0.053 with the standard deviations 0.057 and 0.053 for the prediction of helix/strand content, respectively. Both tests indicate the self-consistency and the extrapolating effectiveness of the new method. The high prediction accuracy means that the method is suitable for practical applications.  相似文献   

12.
The prediction of the secondary structure content (-helix and-strand content) of a globular protein may play an important complementary role in the prediction of the protein's structure. We propose a new prediction algorithm based on Chou's database [Chou (1995),Proteins Struct. Fund Genet. 21, 319]. The new algorithm is an improved multiple linear regression method, taking the nonlinear and coupling terms of the frequencies of different amino acids into account. The prediction is also based on the structural classes of proteins. A resubstitution examination for the algorithm shows that the average errors are 0.040 and 0.033 for the prediction of-helix content and-strand content, respectively. The examination of cross-validation, the jackknife analysis, shows that the average errors are 0.051 and 0.044 for the prediction of-helix content and-strand content, respectively. Both examinations indicate the self-consistency and the extrapolative effectiveness of the new algorithm. Compared with the other methods available currently, our method has the merits of simplicity and convenience for use, as well as a high prediction accuracy. By incorporating the prediction of the structural classes, the only input of our method is the amino acid composition of the protein to be predicted.  相似文献   

13.
Accurate prediction of protein secondary structural content   总被引:2,自引:0,他引:2  
An improved multiple linear regression (MLR) method is proposed to predict a protein's secondary structural content based on its primary sequence. The amino acid composition, the autocorrelation function, and the interaction function of side-chain mass derived from the primary sequence are taken into account. The average absolute errors of prediction over 704 unrelated proteins with the jackknife test are 0.088, 0.081, and 0.059 with standard deviations 0.073, 0.066, and 0.055 for -helix, -sheet, and coil, respectively. That the sum of predicted secondary structure content should be close to 1.0 was introduced as a criterion to evaluate whether the prediction is acceptable. While only the predictions with the sum of predicted secondary structure content between 0.99 and 1.01 are accepted (about 11% of all proteins), the absolute errors are 0.058 for -helix, 0.054 for -sheet, and 0.045 for coil.  相似文献   

14.
Protein solubility plays a major role for understanding the crystal growth and crystallization process of protein. How to predict the propensity of a protein to be soluble or to form inclusion body is a long but not fairly resolved problem. After choosing almost 10,000 protein sequences from NCBI database and eliminating the sequences with 90% homologous similarity by CD-HIT, 5692 sequences remained. By using Chou's pseudo amino acid composition features, we predict the soluble protein with the three methods: support vector machine (SVM), back propagation neural network (BP Neural Network) and hybrid method based on SVM and BP Neural Network, respectively. Each method is evaluated by re-substitution test and 10-fold cross-validation test. In the re-substitution test, the BP Neural Network performs with the best results, in which the accuracy achieves 0.9288 and Matthews Correlation Coefficient (MCC) achieves 0.8513. Meanwhile, the other two methods are better than BP Neural Network in 10-fold cross-validation test. The hybrid method based on SVM and BP Neural Network is the best. The average accuracy is 0.8678 and average MCC is 0.7233. Although all of the three methods achieve considerable evaluations, the hybrid method is deemed to be the best, according to the performance comparison.  相似文献   

15.
Today there are several different experimental scales for the intrinsic α-helix as well as β-strand, propensities of the 20 amino acids obtained from the thermodynamic analysis of various model systems. These scales do not compare well with those extracted from statistical analysis of three-dimensional structure databases. Possible explanations for this could be the limited size of the databases used, the definitions of intrinsic propensities, or the theoretical approach. Here we report a statistical determination of α-helix and β-strand propensities derived from the analysis of a database of 279 three-dimensional structures. Contrary to what has been generally done, we have considered a particular residue as in α-helix or β-strand conformation by looking only at its dihedral angles (?–ψ matrices). Neither the identity nor the conformation of the surrounding residues in the amino acid sequence has been taken into consideration. Pseudoenergy empirical scales have been calculated from the statistical propensities. These scales agree very well with the experimental ones in relative and absolute terms. Moreover, its correlation with the average of the experimental scales for α-helix or β-strand is as good as the correlations of the individual experimental scales with the average. These results show that by using a large enough database and a proper definition for the secondary structure propensities, it is possible to obtain a scale as good as any of experimental origin. Interestingly the ?–ψ analysis of the Ramachandran plot suggests that the amino acids could have different β-strand propensities in different subregions of the β-strand area. © 1994 Wiley-Liss, Inc.  相似文献   

16.
The prediction of the secondary structural contents (those of -helix and -strand) of a globular protein is of great use in the prediction of protein structure. In this paper, a new prediction algorithm has been proposed based on Chou's database [Chou (1995), Proteins 21, 319–344]. The new algorithm is an improved multiple linear regression method, taking into account the nonlinear and coupling terms of the frequencies of different amino acids and the length of the protein. The prediction is also based on the structural classes of proteins, but instead of four classes, only three classes are considered, the class, class, and the mixed + and / class or simply the class. Thus the ambiguity that usually occurs between + proteins and / proteins is eliminated. A resubstitution examination for the algorithm shows that the average absolute errors are 0.040 and 0.035 for the prediction of -helix content and -strand content, respectively. An examination of cross-validation, the jackknife analysis, shows that the average absolute errors are 0.051 and 0.045 for the prediction of -helix content and -strand content, respectively. Both examinations indicate the self-consistency and the extrapolating effectiveness of the new algorithm. Compared with other methods, ours has the merits of simplicity and convenience for use, as well as high prediction accuracy. By incorporating the prediction of the structural classes, the only input of our method is the amino acid composition and the length of the protein to be predicted.  相似文献   

17.
The possible applicability of the new template CoMFA methodology to the prediction of unknown biological affinities was explored. For twelve selected targets, all ChEMBL binding affinities were used as training and/or prediction sets, making these 3D-QSAR models the most structurally diverse and among the largest ever. For six of the targets, X-ray crystallographic structures provided the aligned templates required as input (BACE, cdk1, chk2, carbonic anhydrase-II, factor Xa, PTP1B). For all targets including the other six (hERG, cyp3A4 binding, endocrine receptor, COX2, D2, and GABAa), six modeling protocols applied to only three familiar ligands provided six alternate sets of aligned templates. The statistical qualities of the six or seven models thus resulting for each individual target were remarkably similar. Also, perhaps unexpectedly, the standard deviations of the errors of cross-validation predictions accompanying model derivations were indistinguishable from the standard deviations of the errors of truly prospective predictions. These standard deviations of prediction ranged from 0.70 to 1.14 log units and averaged 0.89 (8x in concentration units) over the twelve targets, representing an average reduction of almost 50% in uncertainty, compared to the null hypothesis of “predicting” an unknown affinity to be the average of known affinities. These errors of prediction are similar to those from Tanimoto coefficients of fragment occurrence frequencies, the predominant approach to side effect prediction, which template CoMFA can augment by identifying additional active structural classes, by improving Tanimoto-only predictions, by yielding quantitative predictions of potency, and by providing interpretable guidance for avoiding or enhancing any specific target response.  相似文献   

18.
研究表明,许多神经退行性疾病都与蛋白质在高尔基体中的定位有关,因此,正确识别亚高尔基体蛋白质对相关疾病药物的研制有一定帮助,本文建立了两类亚高尔基体蛋白质数据集,提取了氨基酸组分信息、联合三联体信息、平均化学位移、基因本体注释信息等特征信息,利用支持向量机算法进行预测,基于5-折交叉检验下总体预测成功率为87.43%。  相似文献   

19.
Nanni L  Lumini A 《Amino acids》2008,34(4):653-660
Given a protein that is localized in the mitochondria it is very important to know the submitochondria localization of that protein to understand its function. In this work, we propose a submitochondria localizer whose feature extraction method is based on the Chou's pseudo-amino acid composition. The pseudo-amino acid based features are obtained by combining pseudo-amino acid compositions with hundreds of amino-acid indices and amino-acid substitution matrices, then from this huge set of features a small set of 15 "artificial" features is created. The feature creation is performed by genetic programming combining one or more "original" features by means of some mathematical operators. Finally, the set of combined features are used to train a radial basis function support vector machine. This method is named GP-Loc. Moreover, we also propose a very few parameterized method, named ALL-Loc, where all the "original" features are used to train a linear support vector machine. The overall prediction accuracy obtained by GP-Loc is 89% when the jackknife cross-validation is used, this result outperforms the performance obtained in the literature (85.2%) using the same dataset. While the overall prediction accuracy obtained by ALL-Loc is 83.9%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号