首页 | 本学科首页   官方微博 | 高级检索  
     


MSLoc-DT: A new method for predicting the protein subcellular location of multispecies based on decision templates
Authors:Shao-Wu Zhang  Yan-Fang LiuYong Yu  Ting-He ZhangXiao-Nan Fan
Affiliation:College of Automation, Northwestern Polytechnical University, Xi’an 710072, China
Abstract:Revealing the subcellular location of newly discovered protein sequences can bring insight to their function and guide research at the cellular level. The rapidly increasing number of sequences entering the genome databanks has called for the development of automated analysis methods. Currently, most existing methods used to predict protein subcellular locations cover only one, or a very limited number of species. Therefore, it is necessary to develop reliable and effective computational approaches to further improve the performance of protein subcellular prediction and, at the same time, cover more species. The current study reports the development of a novel predictor called MSLoc-DT to predict the protein subcellular locations of human, animal, plant, bacteria, virus, fungi, and archaea by introducing a novel feature extraction approach termed Amino Acid Index Distribution (AAID) and then fusing gene ontology information, sequential evolutionary information, and sequence statistical information through four different modes of pseudo amino acid composition (PseAAC) with a decision template rule. Using the jackknife test, MSLoc-DT can achieve 86.5, 98.3, 90.3, 98.5, 95.9, 98.1, and 99.3% overall accuracy for human, animal, plant, bacteria, virus, fungi, and archaea, respectively, on seven stringent benchmark datasets. Compared with other predictors (e.g., Gpos-PLoc, Gneg-PLoc, Virus-PLoc, Plant-PLoc, Plant-mPLoc, ProLoc-Go, Hum-PLoc, GOASVM) on the gram-positive, gram-negative, virus, plant, eukaryotic, and human datasets, the new MSLoc-DT predictor is much more effective and robust. Although the MSLoc-DT predictor is designed to predict the single location of proteins, our method can be extended to multiple locations of proteins by introducing multilabel machine learning approaches, such as the support vector machine and deep learning, as substitutes for the K-nearest neighbor (KNN) method. As a user-friendly web server, MSLoc-DT is freely accessible at http://bioinfo.ibp.ac.cn/MSLOC_DT/index.html.
Keywords:Amino acid index distribution   Decision template   Gene ontology   Multispecies   Subcellular location
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号