首页 | 本学科首页   官方微博 | 高级检索  
   检索      


A method for handling metabonomics data from liquid chromatography/mass spectrometry: combinational use of support vector machine recursive feature elimination, genetic algorithm and random forest for feature selection
Authors:Xiaohui Lin  Quancai Wang  Peiyuan Yin  Liang Tang  Yexiong Tan  Hong Li  Kang Yan  Guowang Xu
Institution:(1) School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China;(2) CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, China;(3) International Cooperation Laboratory on Signal Transduction, Eastern Hepatobiliary Surgery Institute, The Second Military Medical University, Shanghai, China
Abstract:Metabolic markers are the core of metabonomic surveys. Hence selection of differential metabolites is of great importance for either biological or clinical purpose. Here, a feature selection method was developed for complex metabonomic data set. As an effective tool for metabonomics data analysis, support vector machine (SVM) was employed as the basic classifier. To find out meaningful features effectively, support vector machine recursive feature elimination (SVM-RFE) was firstly applied. Then, genetic algorithm (GA) and random forest (RF) which consider the interaction among the metabolites and independent performance of each metabolite in all samples, respectively, were used to obtain more informative metabolic difference and avoid the risk of false positive. A data set from plasma metabonomics study of rat liver diseases developed from hepatitis, cirrhosis to hepatocellular carcinoma was applied for the validation of the method. Besides the good classification results for 3 kinds of liver diseases, 31 important metabolites including lysophosphatidylethanolamine (LPE) C16:0, palmitoylcarnitine, lysophosphatidylethanolamine (LPC) C18:0 were also selected for further studies. A better complementary effect of the three feature selection methods could be seen from the current results. The combinational method also represented more differential metabolites and provided more metabolic information for a “global” understanding of diseases than any single method. Further more, this method is also suitable for other complex biological data sets.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号