首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
We consider multiple testing with false discovery rate (FDR) control when p values have discrete and heterogeneous null distributions. We propose a new estimator of the proportion of true null hypotheses and demonstrate that it is less upwardly biased than Storey's estimator and two other estimators. The new estimator induces two adaptive procedures, that is, an adaptive Benjamini–Hochberg (BH) procedure and an adaptive Benjamini–Hochberg–Heyse (BHH) procedure. We prove that the adaptive BH (aBH) procedure is conservative nonasymptotically. Through simulation studies, we show that these procedures are usually more powerful than their nonadaptive counterparts and that the adaptive BHH procedure is usually more powerful than the aBH procedure and a procedure based on randomized p‐value. The adaptive procedures are applied to a study of HIV vaccine efficacy, where they identify more differentially polymorphic positions than the BH procedure at the same FDR level.  相似文献   

2.
The use of multiple hypothesis testing procedures has been receiving a lot of attention recently by statisticians in DNA microarray analysis. The traditional FWER controlling procedures are not very useful in this situation since the experiments are exploratory by nature and researchers are more interested in controlling the rate of false positives rather than controlling the probability of making a single erroneous decision. This has led to increased use of FDR (False Discovery Rate) controlling procedures. Genovese and Wasserman proposed a single-step FDR procedure that is an asymptotic approximation to the original Benjamini and Hochberg stepwise procedure. In this paper, we modify the Genovese-Wasserman procedure to force the FDR control closer to the level alpha in the independence setting. Assuming that the data comes from a mixture of two normals, we also propose to make this procedure adaptive by first estimating the parameters using the EM algorithm and then using these estimated parameters into the above modification of the Genovese-Wasserman procedure. We compare this procedure with the original Benjamini-Hochberg and the SAM thresholding procedures. The FDR control and other properties of this adaptive procedure are verified numerically.  相似文献   

3.
Sabatti C  Service S  Freimer N 《Genetics》2003,164(2):829-833
We explore the implications of the false discovery rate (FDR) controlling procedure in disease gene mapping. With the aid of simulations, we show how, under models commonly used, the simple step-down procedure introduced by Benjamini and Hochberg controls the FDR for the dependent tests on which linkage and association genome screens are based. This adaptive multiple comparison procedure may offer an important tool for mapping susceptibility genes for complex diseases.  相似文献   

4.
Summary Microarray gene expression studies over ordered categories are routinely conducted to gain insights into biological functions of genes and the underlying biological processes. Some common experiments are time‐course/dose‐response experiments where a tissue or cell line is exposed to different doses and/or durations of time to a chemical. A goal of such studies is to identify gene expression patterns/profiles over the ordered categories. This problem can be formulated as a multiple testing problem where for each gene the null hypothesis of no difference between the successive mean gene expressions is tested and further directional decisions are made if it is rejected. Much of the existing multiple testing procedures are devised for controlling the usual false discovery rate (FDR) rather than the mixed directional FDR (mdFDR), the expected proportion of Type I and directional errors among all rejections. Benjamini and Yekutieli (2005, Journal of the American Statistical Association 100, 71–93) proved that an augmentation of the usual Benjamini–Hochberg (BH) procedure can control the mdFDR while testing simple null hypotheses against two‐sided alternatives in terms of one‐dimensional parameters. In this article, we consider the problem of controlling the mdFDR involving multidimensional parameters. To deal with this problem, we develop a procedure extending that of Benjamini and Yekutieli based on the Bonferroni test for each gene. A proof is given for its mdFDR control when the underlying test statistics are independent across the genes. The results of a simulation study evaluating its performance under independence as well as under dependence of the underlying test statistics across the genes relative to other relevant procedures are reported. Finally, the proposed methodology is applied to a time‐course microarray data obtained by Lobenhofer et al. (2002, Molecular Endocrinology 16, 1215–1229). We identified several important cell‐cycle genes, such as DNA replication/repair gene MCM4 and replication factor subunit C2, which were not identified by the previous analyses of the same data by Lobenhofer et al. (2002) and Peddada et al. (2003, Bioinformatics 19, 834–841). Although some of our findings overlap with previous findings, we identify several other genes that complement the results of Lobenhofer et al. (2002) .  相似文献   

5.
Haibing Zhao  Xinping Cui 《Biometrics》2020,76(4):1098-1108
In large-scale problems, it is common practice to select important parameters by a procedure such as the Benjamini and Hochberg procedure and construct confidence intervals (CIs) for further investigation while the false coverage-statement rate (FCR) for the CIs is controlled at a desired level. Although the well-known BY CIs control the FCR, they are uniformly inflated. In this paper, we propose two methods to construct shorter selective CIs. The first method produces shorter CIs by allowing a reduced number of selective CIs. The second method produces shorter CIs by allowing a prefixed proportion of CIs containing the values of uninteresting parameters. We theoretically prove that the proposed CIs are uniformly shorter than BY CIs and control the FCR asymptotically for independent data. Numerical results confirm our theoretical results and show that the proposed CIs still work for correlated data. We illustrate the advantage of the proposed procedures by analyzing the microarray data from a HIV study.  相似文献   

6.
False discovery rates are routinely controlled by application of the Benjamini–Hochberg step-up procedure to a set of p-values. A method is demonstrated for representing the values so obtained (the BH-FDRs) on a quantile–quantile (Q-Q) plot of the p-values transformed to the negative-logarithmic scale. Recognition of this connection between the BH-FDR and the Q-Q plot facilitates both understanding of the meaning of the BH-FDR and interpretation of the BH-FDR in a particular data set.  相似文献   

7.
The paper is concerned with expected type I errors of some stepwise multiple test procedures based on independent p‐values controlling the so‐called false discovery rate (FDR). We derive an asymptotic result for the supremum of the expected type I error rate(EER) when the number of hypotheses tends to infinity. Among others, it will be shown that when the original Benjamini‐Hochberg step‐up procedure controls the FDR at level α, its EER may approach a value being slightly larger than α/4 when the number of hypotheses increases. Moreover, we derive some least favourable parameter configuration results, some bounds for the FDR and the EER as well as easily computable formulae for the familywise error rate (FWER) of two FDR‐controlling procedures. Finally, we discuss some undesirable properties of the FDR concept, especially the problem of cheating.  相似文献   

8.
This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of Type I error rates, defined as generalized tail probability (gTP) error rates, gTP (q,g) = Pr(g (V(n),S(n)) > q), and generalized expected value (gEV) error rates, gEV (g) = E [g (V(n),S(n))], for arbitrary functions g (V(n),S(n)) of the numbers of false positives V(n) and true positives S(n). Of particular interest are error rates based on the proportion g (V(n),S(n)) = V(n) /(V(n) + S(n)) of Type I errors among the rejected hypotheses, such as the false discovery rate (FDR), FDR = E [V(n) /(V(n) + S(n))]. The proposed procedures offer several advantages over existing methods. They provide Type I error control for general data generating distributions, with arbitrary dependence structures among variables. Gains in power are achieved by deriving rejection regions based on guessed sets of true null hypotheses and null test statistics randomly sampled from joint distributions that account for the dependence structure of the data. The Type I error and power properties of an FDR-controlling version of the resampling-based empirical Bayes approach are investigated and compared to those of widely-used FDR-controlling linear step-up procedures in a simulation study. The Type I error and power trade-off achieved by the empirical Bayes procedures under a variety of testing scenarios allows this approach to be competitive with or outperform the Storey and Tibshirani (2003) linear step-up procedure, as an alternative to the classical Benjamini and Hochberg (1995) procedure.  相似文献   

9.
We develop an approach for microarray differential expression analysis, i.e. identifying genes whose expression levels differ between two or more groups. Current approaches to inference rely either on full parametric assumptions or on permutation-based techniques for sampling under the null distribution. In some situations, however, a full parametric model cannot be justified, or the sample size per group is too small for permutation methods to be valid. We propose a semi-parametric framework based on partial mixture estimation which only requires a parametric assumption for the null (equally expressed) distribution and can handle small sample sizes where permutation methods break down. We develop two novel improvements of Scott's minimum integrated square error criterion for partial mixture estimation [Scott, 2004a,b]. As a side benefit, we obtain interpretable and closed-form estimates for the proportion of EE genes. Pseudo-Bayesian and frequentist procedures for controlling the false discovery rate are given. Results from simulations and real datasets indicate that our approach can provide substantial advantages for small sample sizes over the SAM method of Tusher et al. [2001], the empirical Bayes procedure of Efron and Tibshirani [2002], the mixture of normals of Pan et al. [2003] and a t-test with p-value adjustment [Dudoit et al., 2003] to control the FDR [Benjamini and Hochberg, 1995].  相似文献   

10.
MOTIVATION: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when the number of tested genes gets large. Correlation between the test statistics attributed to gene co-regulation and dependency in the measurement errors of the gene expression levels further complicates the problem. In this paper we address this very large multiplicity problem by adopting the false discovery rate (FDR) controlling approach. In order to address the dependency problem, we present three resampling-based FDR controlling procedures, that account for the test statistics distribution, and compare their performance to that of the na?ve application of the linear step-up procedure in Benjamini and Hochberg (1995). The procedures are studied using simulated microarray data, and their performance is examined relative to their ease of implementation. RESULTS: Comparative simulation analysis shows that all four FDR controlling procedures control the FDR at the desired level, and retain substantially more power then the family-wise error rate controlling procedures. In terms of power, using resampling of the marginal distribution of each test statistics substantially improves the performance over the na?ve one. The highest power is achieved, at the expense of a more sophisticated algorithm, by the resampling-based procedures that resample the joint distribution of the test statistics and estimate the level of FDR control. AVAILABILITY: An R program that adjusts p-values using FDR controlling procedures is freely available over the Internet at www.math.tau.ac.il/~ybenja.  相似文献   

11.
One of multiple testing problems in drug finding experiments is the comparison of several treatments with one control. In this paper we discuss a particular situation of such an experiment, i.e., a microarray setting, where the many-to-one comparisons need to be addressed for thousands of genes simultaneously. For a gene-specific analysis, Dunnett's single step procedure is considered within gene tests, while the FDR controlling procedures such as Significance Analysis of Microarrays (SAM) and Benjamini and Hochberg (BH) False Discovery Rate (FDR) adjustment are applied to control the error rate across genes. The method is applied to a microarray experiment with four treatment groups (three microarrays in each group) and 16,998 genes. Simulation studies are conducted to investigate the performance of the SAM method and the BH-FDR procedure with regard to controlling the FDR, and to investigate the effect of small-variance genes on the FDR in the SAM procedure.  相似文献   

12.
13.
Exact analytic expressions are developed for the average power of the Benjamini and Hochberg false discovery control procedure. The result is based on explicit computation of the joint probability distribution of the total number of rejections and the number of false rejections, and expressed in terms of the cumulative distribution functions of the p-values of the hypotheses. An example of analytic evaluation of the average power is given. The result is confirmed by numerical experiments and applied to a meta-analysis of three clinical studies in mammography.  相似文献   

14.
Human microarrays are readily available, and it would be advantageous if they could be used to study gene expression in other species, such as pigs. The objectives of this research were to validate the use of human microarrays in the analysis of porcine gene expression, to assess the variability of the data generated, and to compare gene expression in boars with different levels of steroidogenesis. Cytochrome b5 (CYB5) expression was used to assess array detection sensitivity. Samples having high or low CYB5 RNA levels were hybridized to microarrays to determine if the known expression difference could be detected. Six hybridizations were conducted using human microarrays containing 3840 total spots representing 1718 characterized human ESTs. To analyze gene expression in boars with different levels of steroidogenesis, testis RNA from four boars with high levels of plasma estrone sulphate was hybridized to testis RNA from four boars with lower levels. Eight microarray hybridizations were conducted including fluor-flips. Self-self hybridizations were also conducted to assess the variability of array experiments. The Cy5 and Cy3 intensity values for each array were normalized using a locally weighted linear regression (LOESS). Statistical significance was assessed using a Student's t-test followed by the Benjamini and Hochberg multiple testing correction procedure. Quantitative real-time PCR (Q-RT-PCR) was used to verify select gene expression differences. The results show that CYB5 was significantly overexpressed in the high CYB5 sample by 1.8 fold (P < 0.05), verifying the known expression difference. The average log2 ratio of the majority of genes (1643) falls within one standard deviation of the mean, indicating the data were reproducible. In the high versus low steroidogenesis experiment, seven genes were significantly overexpressed in the high group (P < 0.05). Quantitative real-time PCR was used to validate five genes with the highest fold change, and the results corroborated those found by the microarray experiments. The results of the self-self hybridizations showed that no genes were significantly differentially expressed following the application of the Benjamini and Hochberg multiple testing correction procedure. The results presented in this report show that human arrays can be used for gene expression analysis in pigs.  相似文献   

15.
Benjamini Y  Heller R 《Biometrics》2008,64(4):1215-1222
SUMMARY: We consider the problem of testing for partial conjunction of hypothesis, which argues that at least u out of n tested hypotheses are false. It offers an in-between approach to the testing of the conjunction of null hypotheses against the alternative that at least one is not, and the testing of the disjunction of null hypotheses against the alternative that all hypotheses are not null. We suggest powerful test statistics for testing such a partial conjunction hypothesis that are valid under dependence between the test statistics as well as under independence. We then address the problem of testing many partial conjunction hypotheses simultaneously using the false discovery rate (FDR) approach. We prove that if the FDR controlling procedure in Benjamini and Hochberg (1995, Journal of the Royal Statistical Society, Series B 57, 289-300) is used for this purpose the FDR is controlled under various dependency structures. Moreover, we can screen at all levels simultaneously in order to display the findings on a superimposed map and still control an appropriate FDR measure. We apply the method to examples from microarray analysis and functional magnetic resonance imaging (fMRI), two application areas where the need for partial conjunction analysis has been identified.  相似文献   

16.
Human microarrays are readily available, and it would be advantageous if they could be used to study gene expression in other species, such as pigs. The objectives of this research were to validate the use of human microarrays in the analysis of porcine gene expression, to assess the variability of the data generated, and to compare gene expression in boars with different levels of steroidogenesis. Cytochrome b5 (CYB5) expression was used to assess array detection sensitivity. Samples having high or low CYB5 RNA levels were hybridized to microarrays to determine if the known expression difference could be detected. Six hybridizations were conducted using human microarrays containing 3840 total spots representing 1718 characterized human ESTs. To analyze gene expression in boars with different levels of steroidogenesis, testis RNA from four boars with high levels of plasma estrone sulphate was hybridized to testis RNA from four boars with lower levels. Eight microarray hybridizations were conducted including fluor-flips. Self-self hybridizations were also conducted to assess the variability of array experiments. The Cy5 and Cy3 intensity values for each array were normalized using a locally weighted linear regression (LOESS). Statistical significance was assessed using a Student's t-test followed by the Benjamini and Hochberg multiple testing correction procedure. Quantitative real-time PCR (Q-RT-PCR) was used to verify select gene expression differences. The results show that CYB5 was significantly overexpressed in the high CYB5 sample by 1.8 fold (P < 0.05), verifying the known expression difference. The average log2 ratio of the majority of genes (1643) falls within one standard deviation of the mean, indicating the data were reproducible. In the high versus low steroidogenesis experiment, seven genes were significantly overexpressed in the high group (P < 0.05). Quantitative real-time PCR was used to validate five genes with the highest fold change, and the results corroborated those found by the microarray experiments. The results of the self-self hybridizations showed that no genes were significantly differentially expressed following the application of the Benjamini and Hochberg multiple testing correction procedure. The results presented in this report show that human arrays can be used for gene expression analysis in pigs.  相似文献   

17.
Under the model of independent test statistics, we propose atwo-parameter family of Bayes multiple testing procedures. Thetwo parameters can be viewed as tuning parameters. Using theBenjamini–Hochberg step-up procedure for controlling falsediscovery rate as a baseline for conservativeness, we choosethe tuning parameters to compromise between the operating characteristicsof that procedure and a less conservative procedure that focuseson alternatives that a priori might be considered likely ormeaningful. The Bayes procedures do not have the theoreticaland practical shortcomings of the popular stepwise procedures.In terms of the number of mistakes, simulations for two examplesindicate that over a large segment of the parameter space, theBayes procedure is preferable to the step-up procedure. Anotherdesirable feature of the procedures is that they are computationallyfeasible for any number of hypotheses.  相似文献   

18.
The multiple testing problem attributed to gene expression analysis is challenging not only by its size, but also by possible dependence between the expression levels of different genes resulting from coregulations of the genes. Furthermore, the measurement errors of these expression levels may be dependent as well since they are subjected to several technical factors. Multiple testing of such data faces the challenge of correlated test statistics. In such a case, the control of the False Discovery Rate (FDR) is not straightforward, and thus demands new approaches and solutions that will address multiplicity while accounting for this dependency. This paper investigates the effects of dependency between bormal test statistics on FDR control in two-sided testing, using the linear step-up procedure (BH) of Benjamini and Hochberg (1995). The case of two multiple hypotheses is examined first. A simulation study offers primary insight into the behavior of the FDR subjected to different levels of correlation and distance between null and alternative means. A theoretical analysis follows in order to obtain explicit upper bounds to the FDR. These results are then extended to more than two multiple tests, thereby offering a better perspective on the effect of the proportion of false null hypotheses, as well as the structure of the test statistics correlation matrix. An example from gene expression data analysis is presented.  相似文献   

19.
Summary As biological studies become more expensive to conduct, statistical methods that take advantage of existing auxiliary information about an expensive exposure variable are desirable in practice. Such methods should improve the study efficiency and increase the statistical power for a given number of assays. In this article, we consider an inference procedure for multivariate failure time with auxiliary covariate information. We propose an estimated pseudopartial likelihood estimator under the marginal hazard model framework and develop the asymptotic properties for the proposed estimator. We conduct simulation studies to evaluate the performance of the proposed method in practical situations and demonstrate the proposed method with a data set from the studies of left ventricular dysfunction ( SOLVD Investigators, 1991 , New England Journal of Medicine 325 , 293–302).  相似文献   

20.
This paper focuses on the development and study of the confidence interval procedures for mean difference between two treatments in the analysis of over‐dispersed count data in order to measure the efficacy of the experimental treatment over the standard treatment in clinical trials. In this study, two simple methods are proposed. One is based on a sandwich estimator of the variance of the regression estimator using the generalized estimating equations (GEEs) approach of Zeger and Liang (1986) and the other is based on an estimator of the variance of a ratio estimator (1977). We also develop three other procedures following the procedures studied by Newcombe (1998) and the procedure studied by Beal (1987). As assessed by Monte Carlo simulations, all the procedures have reasonably well coverage properties. Moreover, the interval procedure based on GEEs outperforms other interval procedures in the sense that it maintains the coverage very close to the nominal coverage level and that it has the shortest interval length, a satisfactory location property, and a very simple form, which can be easily implemented in the applied fields. Illustrative applications in the biological studies for these confidence interval procedures are also presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号