Estimating the null distribution to adjust observed confidence levels for genome-scale screening期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Estimating the null distribution to adjust observed confidence levels for genome-scale screening

Authors:	Bickel David R

Institution:	Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology, Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario K1H 8M5, Canada. dbickel@uottawa.ca

Abstract:	In a novel approach to the multiple testing problem, Efron (2004, Journal of the American Statistical Association 99, 96-104; 2007a Journal of the American Statistical Association 102, 93-103; 2007b, Annals of Statistics 35, 1351-1377) formulated estimators of the distribution of test statistics or nominal p-values under a null distribution suitable for modeling the data of thousands of unaffected genes, nonassociated single-nucleotide polymorphisms, or other biological features. Estimators of the null distribution can improve not only the empirical Bayes procedure for which it was originally intended, but also many other multiple-comparison procedures. Such estimators in some cases improve the proposed multiple-comparison procedure (MCP) based on a recent non-Bayesian framework of minimizing expected loss with respect to a confidence posterior, a probability distribution of confidence levels. The flexibility of that MCP is illustrated with a nonadditive loss function designed for genomic screening rather than for validation. The merit of estimating the null distribution is examined from the vantage point of the confidence-posterior MCP (CPMCP). In a generic simulation study of genome-scale multiple testing, conditioning the observed confidence level on the estimated null distribution as an approximate ancillary statistic markedly improved conditional inference. Specifically simulating gene expression data, however, indicates that estimation of the null distribution tends to exacerbate the conservative bias that results from modeling heavy-tailed data distributions with the normal family. To enable researchers to determine whether to rely on a particular estimated null distribution for inference or decision making, an information-theoretic score is provided. As the sum of the degree of ancillarity and the degree of inferential relevance, the score reflects the balance conditioning would strike between the two conflicting terms. The CPMCP and other methods introduced are applied to gene expression microarray data.

Keywords:	Ancillarity Composite hypothesis testing Conditional inference Confidence distribution Empirical null distribution Genome‐wide association studies Multiple‐comparison procedures Neuroimaging Observed confidence level Simultaneous inference Simultaneous significance testing SNP
本文献已被 PubMed 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏