首页 | 本学科首页   官方微博 | 高级检索  
   检索      


A New Supervised Over-Sampling Algorithm with Application to Protein-Nucleotide Binding Residue Prediction
Authors:Jun Hu  Xue He  Dong-Jun Yu  Xi-Bei Yang  Jing-Yu Yang  Hong-Bin Shen
Institution:1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China.; 2. Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai, China.; 3. Changshu Institute, Nanjing University of Science and Technology, Changshu, Jiangsu, China.; 4. School of Computer Science and Engineering, Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, China.; University of Michigan, United States of America,
Abstract:Protein-nucleotide interactions are ubiquitous in a wide variety of biological processes. Accurately identifying interaction residues solely from protein sequences is useful for both protein function annotation and drug design, especially in the post-genomic era, as large volumes of protein data have not been functionally annotated. Protein-nucleotide binding residue prediction is a typical imbalanced learning problem, where binding residues are extremely fewer in number than non-binding residues. Alleviating the severity of class imbalance has been demonstrated to be a promising means of improving the prediction performance of a machine-learning-based predictor for class imbalance problems. However, little attention has been paid to the negative impact of class imbalance on protein-nucleotide binding residue prediction. In this study, we propose a new supervised over-sampling algorithm that synthesizes additional minority class samples to address class imbalance. The experimental results from protein-nucleotide interaction datasets demonstrate that the proposed supervised over-sampling algorithm can relieve the severity of class imbalance and help to improve prediction performance. Based on the proposed over-sampling algorithm, a predictor, called TargetSOS, is implemented for protein-nucleotide binding residue prediction. Cross-validation tests and independent validation tests demonstrate the effectiveness of TargetSOS. The web-server and datasets used in this study are freely available at http://www.csbio.sjtu.edu.cn/bioinf/TargetSOS/.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号