Exploring the information in p-values for the analysis and planning of multiple-test experiments期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Exploring the information in p-values for the analysis and planning of multiple-test experiments

Authors:	Ruppert David Nettleton Dan Hwang J T Gene

Affiliation:	School of Operations Research and Industrial Engineering, Cornell University, Rhodes Hall, Ithaca, NY 14853, USA. dr24@cornell.edu

Abstract:	Summary A new methodology is proposed for estimating the proportion of true null hypotheses in a large collection of tests. Each test concerns a single parameter δ whose value is specified by the null hypothesis. We combine a parametric model for the conditional cumulative distribution function (CDF) of the p‐value given δ with a nonparametric spline model for the density g(δ) of δ under the alternative hypothesis. The proportion of true null hypotheses and the coefficients in the spline model are estimated by penalized least squares subject to constraints that guarantee that the spline is a density. The estimator is computed efficiently using quadratic programming. Our methodology produces an estimate of the density of δ when the null is false and can address such questions as “when the null is false, is the parameter usually close to the null or far away?” This leads us to define a falsely interesting discovery rate (FIDR), a generalization of the false discovery rate. We contrast the FIDR approach to Efron's (2004, Journal of the American Statistical Association 99, 96–104) empirical null hypothesis technique. We discuss the use of in sample size calculations based on the expected discovery rate (EDR). Our recommended estimator of the proportion of true nulls has less bias compared to estimators based upon the marginal density of the p‐values at 1. In a simulation study, we compare our estimators to the convex, decreasing estimator of Langaas, Lindqvist, and Ferkingstad (2005, Journal of the Royal Statistical Society, Series B 67, 555–572). The most biased of our estimators is very similar in performance to the convex, decreasing estimator. As an illustration, we analyze differences in gene expression between resistant and susceptible strains of barley.

Keywords:	Expected discovery rate False discovery rate Inverse problem Microarray Penalty Power and sample size Quadratic programming Simultaneous tests Splines
本文献已被 PubMed 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏