首页 | 本学科首页   官方微博 | 高级检索  
     


Handling class imbalance problem in miRNA dataset associated with cancer
Authors:Ram Kothandan
Affiliation:Department of Biological Sciences, BITS PILANI K K Birla Goa Campus, Zuarinagar, Vasco Da Gama, India
Abstract:MiRNAs are small (~22nt long) non-coding RNA sequences; binds to the complementarity target sites in 3'' Untranslated Region(UTR) of mRNA sequences but not restricted to other mRNA regions viz., 5'' UTR and Coding sequences (CDS). Complementaritybinding of miRNA to mRNA target sites either results in complete degradation of the mRNA itself or it may regulate the mRNA asan oncogene or as a tumor suppressor gene. However, the exact mechanism involved in identifying a miRNA to be associated withcancer is still unclear. Further, with the outburst in the number of miRNAs sequences recorded every year in miRBase, the gap isstill widening mainly due to the laborious and economically unfavorable experimental procedures associated with the functionalannotation. Motivated by the fact, we constructed a two-step support vector machine-based predictive model - miRSEQ andmiRINT. However, the major pitfall during the construction of the model is the class imbalance problem. Hence, in order toovercome class imbalance problem, in the present study we empirically compare the effectiveness of two different methods viz.,Synthetic Minority Oversampling Technique (SMOTE) and cost-senstive learning method. Performance measures were evaluatedin terms of Precision and Recall. Based on our result, it was observed that for miRNA dataset with high class imbalance utilized forpredicting association of cancer, cost-sensitive method outperformed the oversampling method.
Keywords:Cost-sensitive   SMOTE   miRNA-mRNA interaction   Support Vector Machines
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号