Integrating multi-platform genomic data using hierarchical Bayesian relevance vector machines |
| |
Authors: | Sanvesh Srivastava Wenyi Wang Ganiraju Manyam Carlos Ordonez Veerabhadran Baladandayuthapani |
| |
Affiliation: | 1.Department of Statistics,Purdue University,West Lafayette,USA;2.Department of Bioinformatics and Computational Biology, Division of Quantitative Sciences,The University of Texas MD Anderson Cancer Center,Houston,USA;3.Department of Computer Science,University of Houston,Houston,USA;4.Department of Biostatistics, Division of Quantitative Sciences,The University of Texas MD Anderson Cancer Center,Houston,USA |
| |
Abstract: | BackgroundRecent advances in genome technologies and the subsequent collection of genomic information at various molecular resolutions hold promise to accelerate the discovery of new therapeutic targets. A critical step in achieving these goals is to develop efficient clinical prediction models that integrate these diverse sources of high-throughput data. This step is challenging due to the presence of high-dimensionality and complex interactions in the data. For predicting relevant clinical outcomes, we propose a flexible statistical machine learning approach that acknowledges and models the interaction between platform-specific measurements through nonlinear kernel machines and borrows information within and between platforms through a hierarchical Bayesian framework. Our model has parameters with direct interpretations in terms of the effects of platforms and data interactions within and across platforms. The parameter estimation algorithm in our model uses a computationally efficient variational Bayes approach that scales well to large high-throughput datasets.ResultsWe apply our methods of integrating gene/mRNA expression and microRNA profiles for predicting patient survival times to The Cancer Genome Atlas (TCGA) based glioblastoma multiforme (GBM) dataset. In terms of prediction accuracy, we show that our non-linear and interaction-based integrative methods perform better than linear alternatives and non-integrative methods that do not account for interactions between the platforms. We also find several prognostic mRNAs and microRNAs that are related to tumor invasion and are known to drive tumor metastasis and severe inflammatory response in GBM. In addition, our analysis reveals several interesting mRNA and microRNA interactions that have known implications in the etiology of GBM.ConclusionsOur approach gains its flexibility and power by modeling the non-linear interaction structures between and within the platforms. Our framework is a useful tool for biomedical researchers, since clinical prediction using multi-platform genomic information is an important step towards personalized treatment of many cancers. We have a freely available software at: http://odin.mdacc.tmc.edu/~vbaladan. |
| |
Keywords: | Bayesian modeling Multiple kernel learning Genomics High-dimensional data analysis Prediction Variational inference |
本文献已被 SpringerLink 等数据库收录! |
|