首页 | 本学科首页   官方微博 | 高级检索  
   检索      


A Novel Method for Prediction of Protein Domain Using Distance-Based Maximal Entropy
Authors:Shu-xue Zou  Yan-xin Huang  Yan Wang  Chun-guang Zhou
Institution:Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, P. R. China
Abstract:Detecting the boundaries of protein domains is an important and challenging task in both experimental and computational structural biology. In this paper, a promising method for detecting the domain structure of a protein from sequence information alone is presented. The method is based on analyzing multiple sequence alignments derived from a database search. Multiple measures are defined to quantify the domain information content of each position along the sequence. Then they are combined into a single predictor using support vector machine. What is more important, the domain detection is first taken as an imbalanted data learning problem. A novel undersampling method is proposed on distance-based maximal entropy in the feature space of Support Vector Machine (SVM). The overall precision is about 80%. Simulation results demonstrate that the method can help not only in predicting the complete 3D structure of a protein but also in the machine learning system on general imbalanced datasets.
Keywords:protein domain boundary  SVM  imbalanced data learning  distance-based maximal entropy
本文献已被 CNKI 维普 万方数据 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号