首页 | 本学科首页   官方微博 | 高级检索  
     


Sampling uncertainty versus method uncertainty: A general framework with applications to omics biomarker selection
Authors:Simon Klau  Marie-Laure Martin-Magniette  Anne-Laure Boulesteix  Sabine Hoffmann
Affiliation:1. Institute for Medical Information Processing, Biometry and Epidemiology (IBE), Munich, Germany;2. Institute of Plant Sciences Paris Saclay IPS2, CNRS, INRA, Université Paris-Sud, Université Evry, Université Paris-Saclay, Orsay, France

Institute of Plant Sciences Paris-Saclay IPS2, Paris Diderot, Sorbonne Paris-Cité, Orsay, France

UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, Paris, France

Abstract:Uncertainty is a crucial issue in statistics which can be considered from different points of view. One type of uncertainty, typically referred to as sampling uncertainty, arises through the variability of results obtained when the same analysis strategy is applied to different samples. Another type of uncertainty arises through the variability of results obtained when using the same sample but different analysis strategies addressing the same research question. We denote this latter type of uncertainty as method uncertainty. It results from all the choices to be made for an analysis, for example, decisions related to data preparation, method choice, or model selection. In medical sciences, a large part of omics research is focused on the identification of molecular biomarkers, which can either be performed through ranking or by selection from among a large number of candidates. In this paper, we introduce a general resampling-based framework to quantify and compare sampling and method uncertainty. For illustration, we apply this framework to different scenarios related to the selection and ranking of omics biomarkers in the context of acute myeloid leukemia: variable selection in multivariable regression using different types of omics markers, the ranking of biomarkers according to their predictive performance, and the identification of differentially expressed genes from RNA-seq data. For all three scenarios, our findings suggest highly unstable results when the same analysis strategy is applied to two independent samples, indicating high sampling uncertainty and a comparatively smaller, but non-negligible method uncertainty, which strongly depends on the methods being compared.
Keywords:high-dimensional data  resampling  stability  variable ranking  variable selection
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号