首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods
Authors:Zhang  Wen  Zhu  Xiaopeng  Fu  Yu  Tsuji  Junko  Weng  Zhiping
Institution:1.School of Computer, Wuhan University,Wuhan,China;2.School of Computer Science, Carnegie Mellon University,Pittsburgh,USA;3.Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School,Worcester,USA
Abstract:

Background

Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are still unknown. In order to guide wet experiments, we develop computational methods to predict human splicing branchpoints.

Results

Considering the fact that an intron may have multiple branchpoints, we transform the branchpoint prediction as the multi-label learning problem, and attempt to predict branchpoint sites from intron sequences. First, we investigate a variety of intron sequence-derived features, such as sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile and polypyrimidine tract profile. Second, we consider several multi-label learning methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, and use them as the basic classification engines. Third, we propose two ensemble learning schemes which integrate different features and different classifiers to build ensemble learning systems for the branchpoint prediction. One is the genetic algorithm-based weighted average ensemble method; the other is the logistic regression-based ensemble method.

Conclusions

In the computational experiments, two ensemble learning methods outperform benchmark branchpoint prediction methods, and can produce high-accuracy results on the benchmark dataset.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号