首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 46 毫秒
1.
按照蛋白质序列中残基的相对可溶性,将其分为两类(表面/内部)和三类(表面/中间/内部)进行预测.选择不同窗宽和参数对数据进行训练和预测,以确保得到最好的分类效果,并同其他已有方法进行比较.对同一数据集不同分类阈值的预测结果显示,支持向量机方法对蛋白质可溶性的整体预测效果优于神经网络和信息论的方法.其中,对两类数据的最优分类结果达到79.0%,对三类数据的最优分类结果达到67.5%,表明支持向量机是蛋白质残基可溶性预测的一种有效方法.  相似文献   

2.
基于模糊支持向量机的膜蛋白折叠类型预测   总被引:1,自引:0,他引:1  
现有的基于支持向量机(support vector machine,SVM)来预测膜蛋白折叠类型的方法.利用的蛋白质序列特征并不充分.并且在处理多类蛋白质分类问题时存在不可分区域,针对这两类问题.提取蛋白质序列的氨基酸和二肽组成特征,并计算加权的多阶氨基酸残基指数相关系数特征,将3类特征融和作为分类器的输入特征矢量.并采用模糊SVM(fuzzy SVM,FSVM)算法解决对传统SVM不可分数据的分类.在无冗余的数据集上测试结果显示.改进的特征提取方法在相同分类算法下预测性能优于已有的特征提取方法:FSVM在相同特征提取方法下性能优于传统的SVM.二者相结合的分类策略在独立性数据集测试下的预测精度达到96.6%.优于现有的多种预测方法.能够作为预测膜蛋白和其它蛋白质折叠类型的有效工具.  相似文献   

3.
启动子预测是研究基因转录调控的重要环节,但现有算法的预测正确率偏低.在深入分析启动子生物特征的基础上,提出了一种基于支持向量机的枯草杆菌启动子预测算法,在启动子序列的组成特征、信号特征和结构特征中选取9种典型特征作为预测的依据,对于信号特征,除了利用保守模式的一致序列,还考虑了间隔距离的分布信息.首先通过特征描述模型分别计算每种特征在启动子序列和非启动子序列中的得分,将特征得分组合成9维特征向量,再利用支持向量机在特征向量集上进行训练和判别.对实际数据集进行的刀切法测试验证了算法的有效性.对σ启动予的预测,平均正确率达到了90.7%;对几种其它σ因子启动子的预测,平均正确率也超过了80%.算法不但有广泛的适用性,还有良好的可扩展性,能够方便的容纳新特征,使识别性能不断提高.  相似文献   

4.
外膜蛋白(Outer Membrane Proteins, OMPs)是一类具有重要生物功能的蛋白质, 通过生物信息学方法来预测OMPs能够为预测OMPs的二级和三级结构以及在基因组发现新的OMPs提供帮助。文中提出计算蛋白质序列的氨基酸含量特征、二肽含量特征和加权多阶氨基酸残基指数相关系数特征, 将三类特征组合, 采用支持向量机(Support Vector Machine, SVM)算法来识别OMPs。计算了包括四种残基指数的多种组合特征的识别结果, 并且讨论了相关系数的阶次和权值对预测性能的影响。在数据集上的十倍交叉验证测试和独立性测试结果显示, 组合特征识别方法对OMPs和非OMPs的识别精度最高分别达到96.96%和97.33%, 优于现有的多种方法。在五种细菌基因组内识别OMPs的结果显示, 组合特征方法具有很高的特异性, 并且对PDB数据库中已知结构的OMPs识别准确度超过99%。表明该方法能够作为基因组内筛选OMPs的有效工具。  相似文献   

5.
外膜蛋白(Outer Membrane Proteins, OMPs)是一类具有重要生物功能的蛋白质, 通过生物信息学方法来预测OMPs能够为预测OMPs的二级和三级结构以及在基因组发现新的OMPs提供帮助。文中提出计算蛋白质序列的氨基酸含量特征、二肽含量特征和加权多阶氨基酸残基指数相关系数特征, 将三类特征组合, 采用支持向量机(Support Vector Machine, SVM)算法来识别OMPs。计算了包括四种残基指数的多种组合特征的识别结果, 并且讨论了相关系数的阶次和权值对预测性能的影响。在数据集上的十倍交叉验证测试和独立性测试结果显示, 组合特征识别方法对OMPs和非OMPs的识别精度最高分别达到96.96%和97.33%, 优于现有的多种方法。在五种细菌基因组内识别OMPs的结果显示, 组合特征方法具有很高的特异性, 并且对PDB数据库中已知结构的OMPs识别准确度超过99%。表明该方法能够作为基因组内筛选OMPs的有效工具。  相似文献   

6.
在蛋白质结构预测的研究中,一个重要的问题就是正确预测二硫键的连接,二硫键的准确预测可以减少蛋白质构像的搜索空间,有利于蛋白质3D结构的预测,本文将预测二硫键的连接问题转化成对连接模式的分类问题,并成功地将支持向量机方法引入到预测工作中。通过对半胱氨酸局域序列连接模式的分类预测,可以由蛋白质的一级结构序列预测该蛋白质的二硫键的连接。结果表明蛋白质的二硫键的连接与半胱氨酸局域序列连接模式有重要联系,应用支持向量机方法对蛋白质结构的二硫键预测取得了良好的结果。  相似文献   

7.
许嘉 《生物信息学》2013,11(4):297-299
抗冻蛋白是一类具有提高生物抗冻能力的蛋白质。抗冻蛋白能够特异性的与冰晶相结合,进而阻止体液内冰核的形成与生长。因此,对抗冻蛋白的生物信息学研究对生物工程发展。提高作物抗冻性有重要的推动作用。本文采用由400条抗冻蛋白序列和400条非抗冻蛋白序列构成数据集,以伪氨基酸组分为特征,利用支持向量机分类算法预测抗冻蛋白,对训练集预测精度达到91.3%,对测试集预测精度达到78.8%。该结果证明伪氨基酸组分能够很好的反映抗冻蛋白特性,并能够用于预测抗冻蛋白。  相似文献   

8.
根据凋亡蛋白的亚细胞位置主要决定于它的氨基酸序列这一观点,基于局部氨基酸序列的n肽组分和序列的亲疏水性分布信息,采用离散增量结合支持向量机(ID_SVM)算法,对六类细胞凋亡蛋白的亚细胞位置进行预测。结果表明,在Re-substitution检验和Jackknife检验下,ID_SVM算法的总体预测成功率分别达到了94.6%和84.2%;在5-fold检验和10-fold检验下,其总体预测成功率也都达到了83%以上。通过比较ID和ID_SVM两种方法的预测能力发现,结合了支持向量机的离散增量算法能够改进预测成功率,结果表明ID_SVM是预测凋亡蛋白亚细胞位置的一种很有效的方法。  相似文献   

9.
膜蛋白是一类结构独特的蛋白质,是细胞执行各种功能的物质基础。根据其在细胞膜上的不同存在方式,主要分为六种类型。本文利用压缩的氨基酸对原始膜蛋白序列进行信息压缩,再对压缩序列进行氨基酸组成和顺序特征的提取,最后采用支持向量机构建分类模型。通过五叠交叉验证的结果表明,该方法对于六种膜蛋白的分类预测,准确度最高可达98%以上,平均预测准确度在85%以上,可有效实现膜蛋白六种类型的划分,为进一步分析膜蛋白的结构和功能奠定基础。  相似文献   

10.
研究表明,许多神经退行性疾病都与蛋白质在高尔基体中的定位有关,因此,正确识别亚高尔基体蛋白质对相关疾病药物的研制有一定帮助,本文建立了两类亚高尔基体蛋白质数据集,提取了氨基酸组分信息、联合三联体信息、平均化学位移、基因本体注释信息等特征信息,利用支持向量机算法进行预测,基于5-折交叉检验下总体预测成功率为87.43%。  相似文献   

11.
The enzymatic attributes of newly found protein sequences are usually determined either by biochemical analysis of eukaryotic and prokaryotic genomes or by microarray chips. These experimental methods are both time-consuming and costly. With the explosion of protein sequences registered in the databanks, it is highly desirable to develop an automated method to identify whether a given new sequence belongs to enzyme or non-enzyme. The discrete wavelet transform (DWT) and support vector machine (SVM) have been used in this study for distinguishing enzyme structures from non-enzymes. The networks have been trained and tested on two datasets of proteins with different wavelet basis functions, decomposition scales and hydrophobicity data types. Maximum accuracy has been obtained using SVM with a wavelet function of Bior2.4, a decomposition scale j=5, and Kyte-Doolittle hydrophobicity scales. The results obtained by the self-consistency test, jackknife test and independent dataset test are encouraging, which indicates that the proposed method can be employed as a useful assistant technique for distinguishing enzymes from non-enzymes.  相似文献   

12.
The thermostability of proteins is particularly relevant for enzyme engineering. Developing a computational method to identify mesophilic proteins would be helpful for protein engineering and design. In this work, we developed support vector machine based method to predict thermophilic proteins using the information of amino acid distribution and selected amino acid pairs. A reliable benchmark dataset including 915 thermophilic proteins and 793 non-thermophilic proteins was constructed for training and testing the proposed models. Results showed that 93.8% thermophilic proteins and 92.7% non-thermophilic proteins could be correctly predicted by using jackknife cross-validation. High predictive successful rate exhibits that this model can be applied for designing stable proteins.  相似文献   

13.
Intrinsically disordered proteins are an important class of proteins with unique functions and properties. Here, we have applied a support vector machine (SVM) trained on naturally occurring disordered and ordered proteins to examine the contribution of various parameters (vectors) to recognizing proteins that contain disordered regions. We find that a SVM that incorporates only amino acid composition has a recognition accuracy of 87+/-2%. This result suggests that composition alone is sufficient to accurately recognize disorder. Interestingly, SVMs using reduced sets of amino acids based on chemical similarity preserve high recognition accuracy. A set as small as four retains an accuracy of 84+/-2%; this suggests that general physicochemical properties rather than specific amino acids are important factors contributing to protein disorder.  相似文献   

14.
To further disclose the underlying mechanisms of protein β-sheet formation, studies were made on the rules of β-strands alignment forming β-sheet structure using statistical and machine learning approaches. Firstly, statistical analysis was performed on the sum of β-strands between each β-strand pairs in protein sequences. The results showed a propensity of near-neighbor pairing (or called “first come first pair”) in the β-strand pairs. Secondly, based on the same dataset, the pairwise cross-combinations of real β-strand pairs and four pseudo-β-strand contained pairs were classified by support vector machine (SVM). A novel feature extracting approach was designed for classification using the average amino acid pairing encoding matrix (APEM). Analytical results of the classification indicated that a segment of β-strand had the ability to distinguish β-strands from segments of α-helix and coil. However, the result also showed that a β-strand was not strongly conserved to choose its real partner from all the alternative β-strand partners, which was corresponding with the ordination results of the statistical analysis each other. Thus, the rules of “first come first pair” propensity and the non-conservative ability to choose real partner, were possible important factors affecting the β-strands alignment forming β-sheet structures.  相似文献   

15.
Chen C  Zhou X  Tian Y  Zou X  Cai P 《Analytical biochemistry》2006,357(1):116-121
Because a priori knowledge of a protein structural class can provide useful information about its overall structure, the determination of protein structural class is a quite meaningful topic in protein science. However, with the rapid increase in newly found protein sequences entering into databanks, it is both time-consuming and expensive to do so based solely on experimental techniques. Therefore, it is vitally important to develop a computational method for predicting the protein structural class quickly and accurately. To deal with the challenge, this article presents a dual-layer support vector machine (SVM) fusion network that is featured by using a different pseudo-amino acid composition (PseAA). The PseAA here contains much information that is related to the sequence order of a protein and the distribution of the hydrophobic amino acids along its chain. As a showcase, the rigorous jackknife cross-validation test was performed on the two benchmark data sets constructed by Zhou. A significant enhancement in success rates was observed, indicating that the current approach may serve as a powerful complementary tool to other existing methods in this area.  相似文献   

16.
In this paper, support vector machines (SVMs) are applied to predict the nucleic-acid-binding proteins. We constructed two classifiers to differentiate DNA/RNA-binding proteins from non-nucleic-acid-binding proteins by using a conjoint triad feature which extract information directly from amino acids sequence of protein. Both self-consistency and jackknife tests show promising results on the protein datasets in which the sequences identity is less than 25%. In the self-consistency test, the predictive accuracy is 90.37% for DNA-binding proteins and 89.70% for RNA-binding proteins. In the jackknife test, the predictive accuracies are 78.93% and 76.75%, respectively. Comparison results show that our method is very competitive by outperforming other previously published sequence-based prediction methods.  相似文献   

17.
Based on the 639 non-homologous proteins with 2910 cysteine-containing segments of well-resolved three-dimensional structures, a novel approach has been proposed to predict the disulfide-bonding state of cysteines in proteins by constructing a two-stage classifier combining a first global linear discriminator based on their amino acid composition and a second local support vector machine classifier. The overall prediction accuracy of this hybrid classifier for the disulfide-bonding state of cysteines in proteins has scored 84.1% and 80.1%, when measured on cysteine and protein basis using the rigorous jack-knife procedure, respectively. It shows that whether cysteines should form disulfide bonds depends not only on the global structural features of proteins but also on the local sequence environment of proteins. The result demonstrates the applicability of this novel method and provides comparable prediction performance compared with existing methods for the prediction of the oxidation states of cysteines in proteins.  相似文献   

18.
It is widely considered that it is not appropriate to treat β-pairs in isolation, since other secondary structural models (such as helices, coils), protein topology and protein tertiary structures would limit β-strand pairing. However, to understand the underlying mechanisms of β-sheet formation, studies ought to be performed separately on more concrete aspects. In this study, we focus on the parallel or antiparallel orientation of β-strands. First, statistical analysis was performed on the relative frequencies of the interstrand amino acid pairs within parallel and antiparallel β-strands. Consequently, features were extracted by singular value decomposition from the statistical results. By using the support vector machine to distinguish the features extracted from the two types of β-strands, high accuracy was achieved (up to 99.4%). This suggests that the interstrand amino acid pairs play a significant role in determining the parallel or antiparallel orientation of β-strands. These results may provide useful information for developing other useful algorithms to examine to the β-strand folding pathways, and could eventually lead to protein structure predictions.  相似文献   

19.
Prediction of protein classification is an important topic in molecular biology. This is because it is able to not only provide useful information from the viewpoint of structure itself, but also greatly stimulate the characterization of many other features of proteins that may be closely correlated with their biological functions. In this paper, the LogitBoost, one of the boosting algorithms developed recently, is introduced for predicting protein structural classes. It performs classification using a regression scheme as the base learner, which can handle multi-class problems and is particularly superior in coping with noisy data. It was demonstrated that the LogitBoost outperformed the support vector machines in predicting the structural classes for a given dataset, indicating that the new classifier is very promising. It is anticipated that the power in predicting protein structural classes as well as many other bio-macromolecular attributes will be further strengthened if the LogitBoost and some other existing algorithms can be effectively complemented with each other.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号