首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Müller BU  Stich B  Piepho HP 《Heredity》2011,106(5):825-831
Control of the genome-wide type I error rate (GWER) is an important issue in association mapping and linkage mapping experiments. For the latter, different approaches, such as permutation procedures or Bonferroni correction, were proposed. The permutation test, however, cannot account for population structure present in most association mapping populations. This can lead to false positive associations. The Bonferroni correction is applicable, but usually on the conservative side, because correlation of tests cannot be exploited. Therefore, a new approach is proposed, which controls the genome-wide error rate, while accounting for population structure. This approach is based on a simulation procedure that is equally applicable in a linkage and an association-mapping context. Using the parameter settings of three real data sets, it is shown that the procedure provides control of the GWER and the generalized genome-wide type I error rate (GWER(k)).  相似文献   

2.
MOTIVATION: Sample size calculation is important in experimental design and is even more so in microarray or proteomic experiments since only a few repetitions can be afforded. In the multiple testing problems involving these experiments, it is more powerful and more reasonable to control false discovery rate (FDR) or positive FDR (pFDR) instead of type I error, e.g. family-wise error rate (FWER). When controlling FDR, the traditional approach of estimating sample size by controlling type I error is no longer applicable. RESULTS: Our proposed method applies to controlling FDR. The sample size calculation is straightforward and requires minimal computation, as illustrated with two sample t-tests and F-tests. Based on simulation with the resultant sample size, the power is shown to be achievable by the q-value procedure. AVAILABILITY: A Matlab code implementing the described methods is available upon request.  相似文献   

3.
Implementing false discovery rate control: increasing your power   总被引:23,自引:0,他引:23  
Popular procedures to control the chance of making type I errors when multiple statistical tests are performed come at a high cost: a reduction in power. As the number of tests increases, power for an individual test may become unacceptably low. This is a consequence of minimizing the chance of making even a single type I error, which is the aim of, for instance, the Bonferroni and sequential Bonferroni procedures. An alternative approach, control of the false discovery rate (FDR), has recently been advocated for ecological studies. This approach aims at controlling the proportion of significant results that are in fact type I errors. Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easy-to-interpret way of addressing the multiple testing issue. To encourage practical use of the approach, in this note we illustrate how the proposed procedure works, we compare it to more traditional methods that control the familywise error rate, and we discuss some recent useful developments in FDR control.  相似文献   

4.
In two‐stage group sequential trials with a primary and a secondary endpoint, the overall type I error rate for the primary endpoint is often controlled by an α‐level boundary, such as an O'Brien‐Fleming or Pocock boundary. Following a hierarchical testing sequence, the secondary endpoint is tested only if the primary endpoint achieves statistical significance either at an interim analysis or at the final analysis. To control the type I error rate for the secondary endpoint, this is tested using a Bonferroni procedure or any α‐level group sequential method. In comparison with marginal testing, there is an overall power loss for the test of the secondary endpoint since a claim of a positive result depends on the significance of the primary endpoint in the hierarchical testing sequence. We propose two group sequential testing procedures with improved secondary power: the improved Bonferroni procedure and the improved Pocock procedure. The proposed procedures use the correlation between the interim and final statistics for the secondary endpoint while applying graphical approaches to transfer the significance level from the primary endpoint to the secondary endpoint. The procedures control the familywise error rate (FWER) strongly by construction and this is confirmed via simulation. We also compare the proposed procedures with other commonly used group sequential procedures in terms of control of the FWER and the power of rejecting the secondary hypothesis. An example is provided to illustrate the procedures.  相似文献   

5.
The Newman-Keuls (NK) procedure for testing all pairwise comparisons among a set of treatment means, introduced by Newman (1939) and in a slightly different form by Keuls (1952) was proposed as a reasonable way to alleviate the inflation of error rates when a large number of means are compared. It was proposed before the concepts of different types of multiple error rates were introduced by Tukey (1952a, b; 1953). Although it was popular in the 1950s and 1960s, once control of the familywise error rate (FWER) was accepted generally as an appropriate criterion in multiple testing, and it was realized that the NK procedure does not control the FWER at the nominal level at which it is performed, the procedure gradually fell out of favor. Recently, a more liberal criterion, control of the false discovery rate (FDR), has been proposed as more appropriate in some situations than FWER control. This paper notes that the NK procedure and a nonparametric extension controls the FWER within any set of homogeneous treatments. It proves that the extended procedure controls the FDR when there are well-separated clusters of homogeneous means and between-cluster test statistics are independent, and extensive simulation provides strong evidence that the original procedure controls the FDR under the same conditions and some dependent conditions when the clusters are not well-separated. Thus, the test has two desirable error-controlling properties, providing a compromise between FDR control with no subgroup FWER control and global FWER control. Yekutieli (2002) developed an FDR-controlling procedure for testing all pairwise differences among means, without any FWER-controlling criteria when there is more than one cluster. The empirica example in Yekutieli's paper was used to compare the Benjamini-Hochberg (1995) method with apparent FDR control in this context, Yekutieli's proposed method with proven FDR control, the Newman-Keuls method that controls FWER within equal clusters with apparent FDR control, and several methods that control FWER globally. The Newman-Keuls is shown to be intermediate in number of rejections to the FWER-controlling methods and the FDR-controlling methods in this example, although it is not always more conservative than the other FDR-controlling methods.  相似文献   

6.
Zou G  Zuo Y 《Genetics》2006,172(1):687-691
With respect to the multiple-tests problem, recently an increasing amount of attention has been paid to control the false discovery rate (FDR), the positive false discovery rate (pFDR), and the proportion of false positives (PFP). The new approaches are generally believed to be more powerful than the classical Bonferroni one. This article focuses on the PFP approach. It demonstrates via examples in genetic association studies that the Bonferroni procedure can be more powerful than the PFP-control one and also shows the intrinsic connection between controlling the PFP and controlling the overall type I error rate. Since controlling the PFP does not necessarily lead to a desired power level, this article addresses the design issue and recommends the sample sizes that can attain the desired power levels when the PFP is controlled. The results in this article also provide rough guidance for the sample sizes to achieve the desired power levels when the FDR and especially the pFDR are controlled.  相似文献   

7.
Cheng C  Pounds S 《Bioinformation》2007,1(10):436-446
The microarray gene expression applications have greatly stimulated the statistical research on the massive multiple hypothesis tests problem. There is now a large body of literature in this area and basically five paradigms of massive multiple tests: control of the false discovery rate (FDR), estimation of FDR, significance threshold criteria, control of family-wise error rate (FWER) or generalized FWER (gFWER), and empirical Bayes approaches. This paper contains a technical survey of the developments of the FDR-related paradigms, emphasizing precise formulation of the problem, concepts of error measurements, and considerations in applications. The goal is not to do an exhaustive literature survey, but rather to review the current state of the field.  相似文献   

8.
The data from genome-wide association studies (GWAS) in humans are still predominantly analyzed using single-marker association methods. As an alternative to single-marker analysis (SMA), all or subsets of markers can be tested simultaneously. This approach requires a form of penalized regression (PR) as the number of SNPs is much larger than the sample size. Here we review PR methods in the context of GWAS, extend them to perform penalty parameter and SNP selection by false discovery rate (FDR) control, and assess their performance in comparison with SMA. PR methods were compared with SMA, using realistically simulated GWAS data with a continuous phenotype and real data. Based on these comparisons our analytic FDR criterion may currently be the best approach to SNP selection using PR for GWAS. We found that PR with FDR control provides substantially more power than SMA with genome-wide type-I error control but somewhat less power than SMA with Benjamini–Hochberg FDR control (SMA-BH). PR with FDR-based penalty parameter selection controlled the FDR somewhat conservatively while SMA-BH may not achieve FDR control in all situations. Differences among PR methods seem quite small when the focus is on SNP selection with FDR control. Incorporating linkage disequilibrium into the penalization by adapting penalties developed for covariates measured on graphs can improve power but also generate more false positives or wider regions for follow-up. We recommend the elastic net with a mixing weight for the Lasso penalty near 0.5 as the best method.  相似文献   

9.
The paper is concerned with expected type I errors of some stepwise multiple test procedures based on independent p‐values controlling the so‐called false discovery rate (FDR). We derive an asymptotic result for the supremum of the expected type I error rate(EER) when the number of hypotheses tends to infinity. Among others, it will be shown that when the original Benjamini‐Hochberg step‐up procedure controls the FDR at level α, its EER may approach a value being slightly larger than α/4 when the number of hypotheses increases. Moreover, we derive some least favourable parameter configuration results, some bounds for the FDR and the EER as well as easily computable formulae for the familywise error rate (FWER) of two FDR‐controlling procedures. Finally, we discuss some undesirable properties of the FDR concept, especially the problem of cheating.  相似文献   

10.
Controlling for the multiplicity effect is an essential part of determining statistical significance in large-scale single-locus association genome scans on Single Nucleotide Polymorphisms (SNPs). Bonferroni adjustment is a commonly used approach due to its simplicity, but is conservative and has low power for large-scale tests. The permutation test, which is a powerful and popular tool, is computationally expensive and may mislead in the presence of family structure. We propose a computationally efficient and powerful multiple testing correction approach for Linkage Disequilibrium (LD) based Quantitative Trait Loci (QTL) mapping on the basis of graphical weighted-Bonferroni methods. The proposed multiplicity adjustment method synthesizes weighted Bonferroni-based closed testing procedures into a powerful and versatile graphical approach. By tailoring different priorities for the two hypothesis tests involved in LD based QTL mapping, we are able to increase power and maintain computational efficiency and conceptual simplicity. The proposed approach enables strong control of the familywise error rate (FWER). The performance of the proposed approach as compared to the standard Bonferroni correction is illustrated by simulation and real data. We observe a consistent and moderate increase in power under all simulated circumstances, among different sample sizes, heritabilities, and number of SNPs. We also applied the proposed method to a real outbred mouse HDL cholesterol QTL mapping project where we detected the significant QTLs that were highlighted in the literature, while still ensuring strong control of the FWER.  相似文献   

11.
Lin WY  Lee WC 《PloS one》2012,7(4):e33716
The issue of large-scale testing has caught much attention with the advent of high-throughput technologies. In genomic studies, researchers are often confronted with a large number of tests. To make simultaneous inference for the many tests, the false discovery rate (FDR) control provides a practical balance between the number of true positives and the number of false positives. However, when few hypotheses are truly non-null, controlling the FDR may not provide additional advantages over controlling the family-wise error rate (e.g., the Bonferroni correction). To facilitate discoveries from a study, weighting tests according to prior information is a promising strategy. A 'weighted FDR control' (WEI) and a 'prioritized subset analysis' (PSA) have caught much attention. In this work, we compare the two weighting schemes with systematic simulation studies and demonstrate their use with a genome-wide association study (GWAS) on type 1 diabetes provided by the Wellcome Trust Case Control Consortium. The PSA and the WEI both can increase power when the prior is informative. With accurate and precise prioritization, the PSA can especially create substantial power improvements over the commonly-used whole-genome single-step FDR adjustment (i.e., the traditional un-weighted FDR control). When the prior is uninformative (true disease susceptibility regions are not prioritized), the power loss of the PSA and the WEI is almost negligible. However, a caution is that the overall FDR of the PSA can be slightly inflated if the prioritization is not accurate and precise. Our study highlights the merits of using information from mounting genetic studies, and provides insights to choose an appropriate weighting scheme to FDR control on GWAS.  相似文献   

12.
MOTIVATION: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when the number of tested genes gets large. Correlation between the test statistics attributed to gene co-regulation and dependency in the measurement errors of the gene expression levels further complicates the problem. In this paper we address this very large multiplicity problem by adopting the false discovery rate (FDR) controlling approach. In order to address the dependency problem, we present three resampling-based FDR controlling procedures, that account for the test statistics distribution, and compare their performance to that of the na?ve application of the linear step-up procedure in Benjamini and Hochberg (1995). The procedures are studied using simulated microarray data, and their performance is examined relative to their ease of implementation. RESULTS: Comparative simulation analysis shows that all four FDR controlling procedures control the FDR at the desired level, and retain substantially more power then the family-wise error rate controlling procedures. In terms of power, using resampling of the marginal distribution of each test statistics substantially improves the performance over the na?ve one. The highest power is achieved, at the expense of a more sophisticated algorithm, by the resampling-based procedures that resample the joint distribution of the test statistics and estimate the level of FDR control. AVAILABILITY: An R program that adjusts p-values using FDR controlling procedures is freely available over the Internet at www.math.tau.ac.il/~ybenja.  相似文献   

13.
Detection of positive Darwinian selection has become ever more important with the rapid growth of genomic data sets. Recent branch-site models of codon substitution account for variation of selective pressure over branches on the tree and across sites in the sequence and provide a means to detect short episodes of molecular adaptation affecting just a few sites. In likelihood ratio tests based on such models, the branches to be tested for positive selection have to be specified a priori. In the absence of a biological hypothesis to designate so-called foreground branches, one may test many branches, but a correction for multiple testing becomes necessary. In this paper, we employ computer simulation to evaluate the performance of 6 multiple test correction procedures when the branch-site models are used to test every branch on the phylogeny for positive selection. Four of the methods control the familywise error rates (FWERs), whereas the other 2 control the false discovery rate (FDR). We found that all correction procedures achieved acceptable FWER except for extremely divergent sequences and serious model violations, when the test may become unreliable. The power of the test to detect positive selection is influenced by the strength of selection and the sequence divergence, with the highest power observed at intermediate divergences. The 4 correction procedures that control the FWER had similar power. We recommend Rom's procedure for its slightly higher power, but the simple Bonferroni correction is useable as well. The 2 correction procedures that control the FDR had slightly more power and also higher FWER. We demonstrate the multiple test procedures by analyzing gene sequences from the extracellular domain of the cluster of differentiation 2 (CD2) gene from 10 mammalian species. Both our simulation and real data analysis suggest that the multiple test procedures are useful when multiple branches have to be tested on the same data set.  相似文献   

14.
We present a meta-analysis procedure for genome-wide linkage studies (MAGS). The MAGS procedure combines genome-wide linkage results across studies with possibly distinct marker maps. We applied the MAGS procedure to the simulated data from the Genetic Analysis Workshop 14 in order to investigate power to detect linkage to disease genes and power to detect linkage to disease modifier genes while controlling for type I error. We analyzed all 100 replicates of the four simulated studies for chromosomes 1 (disease gene), 2 (modifier gene), 3 (disease gene), 4 (no disease gene), 5 (disease gene), and 10 (modifier gene) with knowledge of the simulated disease gene locations. We found that the procedure correctly identified the disease loci on chromosomes 1, 3, and 5 and did not erroneously identify a linkage signal on chromosome 4. The MAGS procedure provided little to no evidence of linkage to the disease modifier genes on chromosomes 2 and 10.  相似文献   

15.
Beyond Bonferroni: less conservative analyses for conservation genetics   总被引:1,自引:0,他引:1  
Studies in conservation genetics often attempt to determine genetic differentiation between two or more temporally or geographically distinct sample collections. Pairwise p-values from Fisher’s exact tests or contingency Chi-square tests are commonly reported with a Bonferroni correction for multiple tests. While the Bonferroni correction controls the experiment-wise α, this correction is very conservative and results in greatly diminished power to detect differentiation among pairs of sample collections. An alternative is to control the false discovery rate (FDR) that provides increased power, but this method only maintains experiment-wise α when none of the pairwise comparisons are significant. Recent modifications to the FDR method provide a moderate approach to determining significance level. Simulations reveal that critical values of multiple comparison tests with both the Bonferroni method and a modified FDR method approach a minimum asymptote very near zero as the number of tests gets large, but the Bonferroni method approaches zero much more rapidly than the modified FDR method. I compared pairwise significance from three published studies using three critical values corresponding to Bonferroni, FDR, and modified FDR methods. Results suggest that the modified FDR method may provide the most biologically important critical value for evaluating significance of population differentiation in conservation genetics.␣Ultimately, more thorough reporting of statistical significance is needed to allow interpretation of biological significance of genetic differentiation among populations.An erratum to this article can be found at  相似文献   

16.

Background

High-throughput technologies, such as DNA microarray, have significantly advanced biological and biomedical research by enabling researchers to carry out genome-wide screens. One critical task in analyzing genome-wide datasets is to control the false discovery rate (FDR) so that the proportion of false positive features among those called significant is restrained. Recently a number of FDR control methods have been proposed and widely practiced, such as the Benjamini-Hochberg approach, the Storey approach and Significant Analysis of Microarrays (SAM).

Methods

This paper presents a straight-forward yet powerful FDR control method termed miFDR, which aims to minimize FDR when calling a fixed number of significant features. We theoretically proved that the strategy used by miFDR is able to find the optimal number of significant features when the desired FDR is fixed.

Results

We compared miFDR with the BH approach, the Storey approach and SAM on both simulated datasets and public DNA microarray datasets. The results demonstrated that miFDR outperforms others by identifying more significant features under the same FDR cut-offs. Literature search showed that many genes called only by miFDR are indeed relevant to the underlying biology of interest.

Conclusions

FDR has been widely applied to analyzing high-throughput datasets allowed for rapid discoveries. Under the same FDR threshold, miFDR is capable to identify more significant features than its competitors at a compatible level of complexity. Therefore, it can potentially generate great impacts on biological and biomedical research.

Availability

If interested, please contact the authors for getting miFDR.
  相似文献   

17.

Background  

The evaluation of statistical significance has become a critical process in identifying differentially expressed genes in microarray studies. Classical p-value adjustment methods for multiple comparisons such as family-wise error rate (FWER) have been found to be too conservative in analyzing large-screening microarray data, and the False Discovery Rate (FDR), the expected proportion of false positives among all positives, has been recently suggested as an alternative for controlling false positives. Several statistical approaches have been used to estimate and control FDR, but these may not provide reliable FDR estimation when applied to microarray data sets with a small number of replicates.  相似文献   

18.
OBJECTIVE: In affected sib pair studies without genotyped parents the effect of genotyping error is generally to reduce the type I error rate and power of tests for linkage. The effect of genotyping error when parents have been genotyped is unknown. We investigated the type I error rate of the single-point Mean test for studies in which genotypes of both parents are available. METHODS: Datasets were simulated assuming no linkage and one of five models for genotyping error. In each dataset, Mendelian-inconsistent families were either excluded or regenotyped, and then the Mean test applied. RESULTS: We found that genotyping errors lead to an inflated type I error rate when inconsistent families are excluded. Depending on the genotyping-error model assumed, regenotyping inconsistent families has one of several effects. It may produce the same type I error rate as if inconsistent families are excluded; it may reduce the type I error, but still leave an anti-conservative test; or it may give a conservative test. Departures of the type I error rate from its nominal level increase with both the genotyping error rate and sample size. CONCLUSION: We recommend that markers with high error rates either be excluded from the analysis or be regenotyped in all families.  相似文献   

19.
Haplotypes--that is, linear arrangements of alleles on the same chromosome that were inherited as a unit--are expected to carry important information in the context of association fine mapping of complex diseases. In consideration of a set of tightly linked markers, there is an enormous number of different marker combinations that can be analyzed. Therefore, a severe multiple-testing problem is introduced. One method to deal with this problem is Bonferroni correction by the number of combinations that are considered. Bonferroni correction is appropriate for independent tests but will result in a loss of power in the presence of linkage disequilibrium in the region. A second method is to perform simulations. It is unfortunate that most methods of haplotype analysis already require simulations to obtain an uncorrected P value for a specific marker combination. Thus, it seems that nested simulations are necessary to obtain P values that are corrected for multiple testing, which, apparently, limits the applicability of this approach because of computer running-time restrictions. Here, an algorithm is described that avoids such nested simulations. We check the validity of our approach under two disease models for haplotype analysis of family data. The true type I error rate of our algorithm corresponds to the nominal significance level. Furthermore, we observe a strong gain in power with our method to obtain the global P value, compared with the Bonferroni procedure to calculate the global P value. The method described here has been implemented in the latest update of our program FAMHAP.  相似文献   

20.
Escaping the Bonferroni iron claw in ecological studies   总被引:27,自引:0,他引:27  
Luis V. García 《Oikos》2004,105(3):657-663
I analyze some criticisms made about the application of alpha-inflation correction procedures to repeated-test tables in ecological studies. Common pitfalls during application, the statistical properties of many ecological datasets, and the strong control of the tablewise error rate made by the widely used sequential Bonferroni procedures, seem to be responsible for some 'illogical' results when such corrections are applied. Sharpened Bonferroni-type procedures may alleviate the decrease in power associated to standard methods as the number of tests increases.
More powerful methods, based on controlling the false discovery rate (FDR), deserve a more frequent use in ecological studies, especially in those involving large repeated-test tables in which several or many individual null hypotheses have been rejected, and the most significant p-value is relatively large.
I conclude that some reasonable control of alpha inflation is required of authors as a safeguard against striking, but spurious findings, which may strongly affect the credibility of ecological research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号