首页 | 本学科首页   官方微博 | 高级检索  
   检索      


Estimating the null distribution to adjust observed confidence levels for genome-scale screening
Authors:Bickel David R
Institution:Ottawa Institute of Systems Biology, Department of Biochemistry, Microbiology, and Immunology, Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario K1H 8M5, Canada. dbickel@uottawa.ca
Abstract:In a novel approach to the multiple testing problem, Efron (2004, Journal of the American Statistical Association 99, 96-104; 2007a Journal of the American Statistical Association 102, 93-103; 2007b, Annals of Statistics 35, 1351-1377) formulated estimators of the distribution of test statistics or nominal p-values under a null distribution suitable for modeling the data of thousands of unaffected genes, nonassociated single-nucleotide polymorphisms, or other biological features. Estimators of the null distribution can improve not only the empirical Bayes procedure for which it was originally intended, but also many other multiple-comparison procedures. Such estimators in some cases improve the proposed multiple-comparison procedure (MCP) based on a recent non-Bayesian framework of minimizing expected loss with respect to a confidence posterior, a probability distribution of confidence levels. The flexibility of that MCP is illustrated with a nonadditive loss function designed for genomic screening rather than for validation. The merit of estimating the null distribution is examined from the vantage point of the confidence-posterior MCP (CPMCP). In a generic simulation study of genome-scale multiple testing, conditioning the observed confidence level on the estimated null distribution as an approximate ancillary statistic markedly improved conditional inference. Specifically simulating gene expression data, however, indicates that estimation of the null distribution tends to exacerbate the conservative bias that results from modeling heavy-tailed data distributions with the normal family. To enable researchers to determine whether to rely on a particular estimated null distribution for inference or decision making, an information-theoretic score is provided. As the sum of the degree of ancillarity and the degree of inferential relevance, the score reflects the balance conditioning would strike between the two conflicting terms. The CPMCP and other methods introduced are applied to gene expression microarray data.
Keywords:Ancillarity  Composite hypothesis testing  Conditional inference  Confidence distribution  Empirical null distribution  Genome‐wide association studies  Multiple‐comparison procedures  Neuroimaging  Observed confidence level  Simultaneous inference  Simultaneous significance testing  SNP
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号