首页 | 本学科首页   官方微博 | 高级检索  
     

基于位置特异性谱和输入加权神经网络的蛋白质亚细胞定位预测
引用本文:邹凌云,王正志,黄教民. 基于位置特异性谱和输入加权神经网络的蛋白质亚细胞定位预测[J]. 遗传学报, 2007, 34(12): 1080-1087. DOI: 10.1016/S1673-8527(07)60123-4
作者姓名:邹凌云  王正志  黄教民
作者单位:国防科技大学机电工程与自动化学院自动化研究所,长沙,410073
摘    要:蛋白质必须处于正确的亚细胞位置才能行使其功能。文章利用PSI-BLAST工具搜索蛋白质序列,提取位点特异性谱中的位点特异性得分矩阵作为蛋白质的一类特征,并计算4等分序列的氨基酸含量以及1~7阶二肽含量作为另外两类特征,由这三类特征一共得到蛋白质序列的12个特征向量。通过设计一个简单加权函数对各类特征向量加权处理,作为神经网络预测器的输入,并使用Levenberg-Marquardt算法代替传统的EBP算法来调整网络权值和阈值,大大提高了训练速度。对具有4类亚细胞位置和12类亚细胞位置的两种蛋白质数据集分别进行"留一法"测试和5倍交叉验证测试,总体预测精度分别达到88.4%和83.3%。其中,对4类亚细胞位置数据集的预测效果优于普通BP神经网络、隐马尔可夫模型、模糊K邻近等预测方法,对12类亚细胞位置数据集的预测效果优于支持向量机分类方法。最后还对三类特征采取不同加权比例对预测精度的影响进行了讨论,对选择的八种加权比例的预测结果表明,分别给予三类特征合适的权值系数可以进一步提高预测精度。

关 键 词:亚细胞定位  位置特异性迭代BLAST  位置特异性得分矩阵  加权函数  BP神经网络
收稿时间:2007-03-12
修稿时间:2007-06-14

Prediction of Subcellular Localization of Eukaryotic Proteins Using Position-Specific Profiles and Neural Network with Weighted Inputs
Lingyun Zou,Zhengzhi Wang,Jiaomin Huang. Prediction of Subcellular Localization of Eukaryotic Proteins Using Position-Specific Profiles and Neural Network with Weighted Inputs[J]. Journal of Genetics and Genomics, 2007, 34(12): 1080-1087. DOI: 10.1016/S1673-8527(07)60123-4
Authors:Lingyun Zou  Zhengzhi Wang  Jiaomin Huang
Affiliation:College of Mechatronics and Automation, National University of Defense Technology, Changsha 410073, China. zoulingyun@nudt.edu.cn
Abstract:Subcellular location is one of the key biological characteristics of proteins. Position-specific profiles (PSP) have been introduced as important characteristics of proteins in this article. In this study, to obtain position-specific profiles, the Position Specific Iterative-Basic Local Alignment Search Tool (PSI-BLAST) has been used to search for protein sequences in a database. Position-specific scoring matrices are extracted from the profiles as one class of characteristics. Four-part amino acid compositions and 1st–7th order dipeptide compositions have also been calculated as the other two classes of characteristics. Therefore, twelve characteristic vectors are extracted from each of the protein sequences. Next, the characteristic vectors are weighed by a simple weighing function and inputted into a BP neural network predictor named PSP-Weighted Neural Network (PSP-WNN). The Levenberg-Marquardt algorithm is employed to adjust the weight matrices and thresholds during the network training instead of the error back propagation algorithm. With a jackknife test on the RH2427 dataset, PSP-WNN has achieved a higher overall prediction accuracy of 88.4% rather than the prediction results by the general BP neural network, Markov model, and fuzzy k-nearest neighbors algorithm on this dataset. In addition, the prediction performance of PSP-WNN has been evaluated with a five-fold cross validation test on the PK7579 dataset and the prediction results have been consistently better than those of the previous method on the basis of several support vector machines, using compositions of both amino acids and amino acid pairs. These results indicate that PSP-WNN is a powerful tool for subcellular localization prediction. At the end of the article, influences on prediction accuracy using different weighting proportions among three characteristic vector categories have been discussed. An appropriate proportion is considered by increasing the prediction accuracy.
Keywords:subcellular localization  PSI-BLAST  position-specific scoring matrices  weighting function  BP neural network
本文献已被 CNKI 维普 万方数据 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号