首页 | 本学科首页   官方微博 | 高级检索  
   检索      

基于氨基酸组成分布的嗜热和嗜冷蛋白随机森林分类模型
引用本文:张光亚,方柏山.基于氨基酸组成分布的嗜热和嗜冷蛋白随机森林分类模型[J].生物工程学报,2008,24(2):302-308.
作者姓名:张光亚  方柏山
作者单位:华侨大学工业生物技术福建省高校重点实验室,泉州,362021
基金项目:“973计划”(No.2007CB707804)和福建省自然科学基金资助项目(No.2007J0360)资助项目。
摘    要:文献报道采用氨基酸组成分布提取特征值能有效提高预测分类精度, 本文采用该方法提取特征值, 使用一种新的组合分类器——随机森林, 从蛋白质一级结构对嗜热和嗜冷蛋白进行分类。通过10倍交叉验证和独立样本测试两种方法检测, 结果表明:当分段数量为1时, 其精度最优, 分别为92.9%和90.2%, 暗示使用基于氨基酸组成分布提取特征值在该算法中并不能有效提高识别精度, 这与报道结果不符, 而该提取方法在SVM中却能适当提高识别精度; 当引入6个新变量后, 其精度分别提高到93.2%和92.2%, ROC曲线下面积分别为0.9771和0.9696, 优于其它组合分类器。

关 键 词:随机森林    氨基酸组成分布    嗜热和嗜冷蛋白    ROC曲线
收稿时间:2007-05-28
修稿时间:2007-09-18

Random Forest for Classification of Thermophilic and Psychrophilic Proteins Based on Amino Acid Composition Distribution
Guangya Zhang and Baishan Fang.Random Forest for Classification of Thermophilic and Psychrophilic Proteins Based on Amino Acid Composition Distribution[J].Chinese Journal of Biotechnology,2008,24(2):302-308.
Authors:Guangya Zhang and Baishan Fang
Institution:Key Laboratory of Industrial Biotechnology, Huaqiao University, Quanzhou 362021, China;Key Laboratory of Industrial Biotechnology, Huaqiao University, Quanzhou 362021, China
Abstract:We used amino acid composition distribution (AACD) to discriminate thermophilic and psychrophilic proteins. We used 10-fold cross-validation and independent testing with other dataset to evaluate the models. The results showed that when the segment was 1, the overall accuracy reached 92.9% and 90.2%, respectively. The AACD method improved the prediction accuracy when support vector machine was used as the classifier. When six new features were introduced, the overall accuracy of random forest improved to 93.2% and 92.2%, the areas under the receiver operation characteristic curve were 0.9771 and 0.9696, which was better than other ensemble classifiers and comparable with that of SVM.
Keywords:Random forest  amino acid composition distribution  thermophilic and psychrophilic protein  ROC curve
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《生物工程学报》浏览原始摘要信息
点击此处可从《生物工程学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号