首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In candidate gene association studies, usually several elementary hypotheses are tested simultaneously using one particular set of data. The data normally consist of partly correlated SNP information. Every SNP can be tested for association with the disease, e.g., using the Cochran-Armitage test for trend. To account for the multiplicity of the test situation, different types of multiple testing procedures have been proposed. The question arises whether procedures taking into account the discreteness of the situation show a benefit especially in case of correlated data. We empirically evaluate several different multiple testing procedures via simulation studies using simulated correlated SNP data. We analyze FDR and FWER controlling procedures, special procedures for discrete situations, and the minP-resampling-based procedure. Within the simulation study, we examine a broad range of different gene data scenarios. We show that the main difference in the varying performance of the procedures is due to sample size. In small sample size scenarios,the minP-resampling procedure though controlling the stricter FWER even had more power than the classical FDR controlling procedures. In contrast, FDR controlling procedures led to more rejections in higher sample size scenarios.  相似文献   

2.
Detection of positive Darwinian selection has become ever more important with the rapid growth of genomic data sets. Recent branch-site models of codon substitution account for variation of selective pressure over branches on the tree and across sites in the sequence and provide a means to detect short episodes of molecular adaptation affecting just a few sites. In likelihood ratio tests based on such models, the branches to be tested for positive selection have to be specified a priori. In the absence of a biological hypothesis to designate so-called foreground branches, one may test many branches, but a correction for multiple testing becomes necessary. In this paper, we employ computer simulation to evaluate the performance of 6 multiple test correction procedures when the branch-site models are used to test every branch on the phylogeny for positive selection. Four of the methods control the familywise error rates (FWERs), whereas the other 2 control the false discovery rate (FDR). We found that all correction procedures achieved acceptable FWER except for extremely divergent sequences and serious model violations, when the test may become unreliable. The power of the test to detect positive selection is influenced by the strength of selection and the sequence divergence, with the highest power observed at intermediate divergences. The 4 correction procedures that control the FWER had similar power. We recommend Rom's procedure for its slightly higher power, but the simple Bonferroni correction is useable as well. The 2 correction procedures that control the FDR had slightly more power and also higher FWER. We demonstrate the multiple test procedures by analyzing gene sequences from the extracellular domain of the cluster of differentiation 2 (CD2) gene from 10 mammalian species. Both our simulation and real data analysis suggest that the multiple test procedures are useful when multiple branches have to be tested on the same data set.  相似文献   

3.
In many applications where it is necessary to test multiple hypotheses simultaneously, the data encountered are discrete. In such cases, it is important for multiplicity adjustment to take into account the discreteness of the distributions of the p‐values, to assure that the procedure is not overly conservative. In this paper, we review some known multiple testing procedures for discrete data that control the familywise error rate, the probability of making any false rejection. Taking advantage of the fact that the exact permutation or exact pairwise permutation distributions of the p‐values can often be determined when the sample size is small, we investigate procedures that incorporate the dependence structure through the exact permutation distribution and propose two new procedures that incorporate the exact pairwise permutation distributions. A step‐up procedure is also proposed that accounts for the discreteness of the data. The performance of the proposed procedures is investigated through simulation studies and two applications. The results show that by incorporating both discreteness and dependency of p‐value distributions, gains in power can be achieved.  相似文献   

4.
The paper is concerned with expected type I errors of some stepwise multiple test procedures based on independent p‐values controlling the so‐called false discovery rate (FDR). We derive an asymptotic result for the supremum of the expected type I error rate(EER) when the number of hypotheses tends to infinity. Among others, it will be shown that when the original Benjamini‐Hochberg step‐up procedure controls the FDR at level α, its EER may approach a value being slightly larger than α/4 when the number of hypotheses increases. Moreover, we derive some least favourable parameter configuration results, some bounds for the FDR and the EER as well as easily computable formulae for the familywise error rate (FWER) of two FDR‐controlling procedures. Finally, we discuss some undesirable properties of the FDR concept, especially the problem of cheating.  相似文献   

5.
Controlling for the multiplicity effect is an essential part of determining statistical significance in large-scale single-locus association genome scans on Single Nucleotide Polymorphisms (SNPs). Bonferroni adjustment is a commonly used approach due to its simplicity, but is conservative and has low power for large-scale tests. The permutation test, which is a powerful and popular tool, is computationally expensive and may mislead in the presence of family structure. We propose a computationally efficient and powerful multiple testing correction approach for Linkage Disequilibrium (LD) based Quantitative Trait Loci (QTL) mapping on the basis of graphical weighted-Bonferroni methods. The proposed multiplicity adjustment method synthesizes weighted Bonferroni-based closed testing procedures into a powerful and versatile graphical approach. By tailoring different priorities for the two hypothesis tests involved in LD based QTL mapping, we are able to increase power and maintain computational efficiency and conceptual simplicity. The proposed approach enables strong control of the familywise error rate (FWER). The performance of the proposed approach as compared to the standard Bonferroni correction is illustrated by simulation and real data. We observe a consistent and moderate increase in power under all simulated circumstances, among different sample sizes, heritabilities, and number of SNPs. We also applied the proposed method to a real outbred mouse HDL cholesterol QTL mapping project where we detected the significant QTLs that were highlighted in the literature, while still ensuring strong control of the FWER.  相似文献   

6.
Spatial extent inference (SEI) is widely used across neuroimaging modalities to adjust for multiple comparisons when studying brain‐phenotype associations that inform our understanding of disease. Recent studies have shown that Gaussian random field (GRF)‐based tools can have inflated family‐wise error rates (FWERs). This has led to substantial controversy as to which processing choices are necessary to control the FWER using GRF‐based SEI. The failure of GRF‐based methods is due to unrealistic assumptions about the spatial covariance function of the imaging data. A permutation procedure is the most robust SEI tool because it estimates the spatial covariance function from the imaging data. However, the permutation procedure can fail because its assumption of exchangeability is violated in many imaging modalities. Here, we propose the (semi‐) parametric bootstrap joint (PBJ; sPBJ) testing procedures that are designed for SEI of multilevel imaging data. The sPBJ procedure uses a robust estimate of the spatial covariance function, which yields consistent estimates of standard errors, even if the covariance model is misspecified. We use the methods to study the association between performance and executive functioning in a working memory functional magnetic resonance imaging study. The sPBJ has similar or greater power to the PBJ and permutation procedures while maintaining the nominal type 1 error rate in reasonable sample sizes. We provide an R package to perform inference using the PBJ and sPBJ procedures.  相似文献   

7.
In two‐stage group sequential trials with a primary and a secondary endpoint, the overall type I error rate for the primary endpoint is often controlled by an α‐level boundary, such as an O'Brien‐Fleming or Pocock boundary. Following a hierarchical testing sequence, the secondary endpoint is tested only if the primary endpoint achieves statistical significance either at an interim analysis or at the final analysis. To control the type I error rate for the secondary endpoint, this is tested using a Bonferroni procedure or any α‐level group sequential method. In comparison with marginal testing, there is an overall power loss for the test of the secondary endpoint since a claim of a positive result depends on the significance of the primary endpoint in the hierarchical testing sequence. We propose two group sequential testing procedures with improved secondary power: the improved Bonferroni procedure and the improved Pocock procedure. The proposed procedures use the correlation between the interim and final statistics for the secondary endpoint while applying graphical approaches to transfer the significance level from the primary endpoint to the secondary endpoint. The procedures control the familywise error rate (FWER) strongly by construction and this is confirmed via simulation. We also compare the proposed procedures with other commonly used group sequential procedures in terms of control of the FWER and the power of rejecting the secondary hypothesis. An example is provided to illustrate the procedures.  相似文献   

8.
Tao Sun  Yu Cheng  Ying Ding 《Biometrics》2023,79(3):1713-1725
Copula is a popular method for modeling the dependence among marginal distributions in multivariate censored data. As many copula models are available, it is essential to check if the chosen copula model fits the data well for analysis. Existing approaches to testing the fitness of copula models are mainly for complete or right-censored data. No formal goodness-of-fit (GOF) test exists for interval-censored or recurrent events data. We develop a general GOF test for copula-based survival models using the information ratio (IR) to address this research gap. It can be applied to any copula family with a parametric form, such as the frequently used Archimedean, Gaussian, and D-vine families. The test statistic is easy to calculate, and the test procedure is straightforward to implement. We establish the asymptotic properties of the test statistic. The simulation results show that the proposed test controls the type-I error well and achieves adequate power when the dependence strength is moderate to high. Finally, we apply our method to test various copula models in analyzing multiple real datasets. Our method consistently separates different copula models for all these datasets in terms of model fitness.  相似文献   

9.
Population geneticists often use multiple independent hypothesis tests of Hardy–Weinberg Equilibrium (HWE), Linkage Disequilibrium (LD), and population differentiation, to make broad inferences about their systems of choice. However, correcting for Family‐Wise Error Rates (FWER) that are inflated due to multiple comparisons, is sparingly reported in our current literature. In this issue of Molecular Ecology Resources, perform a meta‐analysis of 215 population genetics studies published between 2011 and 2013 to show (i) scarce use of FWER corrections across all three classes of tests, and (ii) when used, inconsistent application of correction methods with a clear bias towards less‐conservative corrections for tests of population differentiation, than for tests of HWE, and LD. Here we replicate this meta‐analysis using 205 population genetics studies published between 2013 and 2018, to show the same continued disuse, and inconsistencies. We hope that both studies serve as a wake‐up call to population geneticists, reviewers, and editors to be rigorous about consistently correcting for FWER inflation.  相似文献   

10.
L. Finos  A. Farcomeni 《Biometrics》2011,67(1):174-181
Summary We show a novel approach for k‐FWER control which does not involve any correction, but only testing the hypotheses along a (possibly data‐driven) order until a suitable number of p‐values are found above the uncorrected α level. p‐values can arise from any linear model in a parametric or nonparametric setting. The approach is not only very simple and computationally undemanding, but also the data‐driven order enhances power when the sample size is small (and also when k and/or the number of tests is large). We illustrate the method on an original study about gene discovery in multiple sclerosis, in which were involved a small number of couples of twins, discordant by disease. The methods are implemented in an R package (someKfwer ), freely available on CRAN.  相似文献   

11.
This article discusses specific assumptions necessary for permutation multiple tests to control the Familywise Error Rate (FWER). At issue is that, in comparing parameters of the marginal distributions of two sets of multivariate observations, validity of permutation testing is affected by all the parameters in the joint distributions of the observations. We show the surprising fact that, in the case of a linear model with i.i.d. errors such as in the analysis of Quantitative Trait Loci (QTL), this issue has no impact on control of FWER, if the test statistic is of a particular form. On the other hand, in the analysis of gene expression levels or multiple safety endpoints, unless some assumption connecting the marginal distributions of the observations to their joint distributions is made, permutation multiple tests may not control FWER.  相似文献   

12.
The use of multiple hypothesis testing procedures has been receiving a lot of attention recently by statisticians in DNA microarray analysis. The traditional FWER controlling procedures are not very useful in this situation since the experiments are exploratory by nature and researchers are more interested in controlling the rate of false positives rather than controlling the probability of making a single erroneous decision. This has led to increased use of FDR (False Discovery Rate) controlling procedures. Genovese and Wasserman proposed a single-step FDR procedure that is an asymptotic approximation to the original Benjamini and Hochberg stepwise procedure. In this paper, we modify the Genovese-Wasserman procedure to force the FDR control closer to the level alpha in the independence setting. Assuming that the data comes from a mixture of two normals, we also propose to make this procedure adaptive by first estimating the parameters using the EM algorithm and then using these estimated parameters into the above modification of the Genovese-Wasserman procedure. We compare this procedure with the original Benjamini-Hochberg and the SAM thresholding procedures. The FDR control and other properties of this adaptive procedure are verified numerically.  相似文献   

13.
This paper discusses two‐sample comparison in the case of interval‐censored failure time data. For the problem, one common approach is to employ some nonparametric test procedures, which usually give some p‐values but not a direct or exact quantitative measure of the survival or treatment difference of interest. In particular, these procedures cannot provide a hazard ratio estimate, which is commonly used to measure the difference between the two treatments or samples. For interval‐censored data, a few nonparametric test procedures have been developed, but it does not seem to exist as a procedure for hazard ratio estimation. Corresponding to this, we present two procedures for nonparametric estimation of the hazard ratio of the two samples for interval‐censored data situations. They are generalizations of the corresponding procedures for right‐censored failure time data. An extensive simulation study is conducted to evaluate the performance of the two procedures and indicates that they work reasonably well in practice. For illustration, they are applied to a set of interval‐censored data arising from a breast cancer study.  相似文献   

14.
Summary Functional magnetic resonance imaging (fMRI) data sets are large and characterized by complex dependence structures driven by highly sophisticated neurophysiology and aspects of the experimental designs. Typical analyses investigating task‐related changes in measured brain activity use a two‐stage procedure in which the first stage involves subject‐specific models and the second‐stage specifies group (or population) level parameters. Customarily, the first‐level accounts for temporal correlations between the serial scans acquired during one scanning session. Despite accounting for these correlations, fMRI studies often include multiple sessions and temporal dependencies may persist between the corresponding estimates of mean neural activity. Further, spatial correlations between brain activity measurements in different locations are often unaccounted for in statistical modeling and estimation. We propose a two‐stage, spatio‐temporal, autoregressive model that simultaneously accounts for spatial dependencies between voxels within the same anatomical region and for temporal dependencies between a subject's estimates from multiple sessions. We develop an algorithm that leverages the special structure of our covariance model, enabling relatively fast and efficient estimation. Using our proposed method, we analyze fMRI data from a study of inhibitory control in cocaine addicts.  相似文献   

15.
The proportion ratio (PR) of responses between an experimental treatment and a control treatment is one of the most commonly used indices to measure the relative treatment effect in a randomized clinical trial. We develop asymptotic and permutation‐based procedures for testing equality of treatment effects as well as derive confidence intervals of PRs for multivariate binary matched‐pair data under a mixed‐effects exponential risk model. To evaluate and compare the performance of these test procedures and interval estimators, we employ Monte Carlo simulation. When the number of matched pairs is large, we find that all test procedures presented here can perform well with respect to Type I error. When the number of matched pairs is small, the permutation‐based test procedures developed in this paper is of use. Furthermore, using test procedures (or interval estimators) based on a weighted linear average estimator of treatment effects can improve power (or gain precision) when the treatment effects on all response variables of interest are known to fall in the same direction. Finally, we apply the data taken from a crossover clinical trial that monitored several adverse events of an antidepressive drug to illustrate the practical use of test procedures and interval estimators considered here.  相似文献   

16.
We describe a new pathway for multivariate analysis of data consisting of counts of species abundances that includes two key components: copulas, to provide a flexible joint model of individual species, and dissimilarity‐based methods, to integrate information across species and provide a holistic view of the community. Individual species are characterized using suitable (marginal) statistical distributions, with the mean, the degree of over‐dispersion, and/or zero‐inflation being allowed to vary among a priori groups of sampling units. Associations among species are then modeled using copulas, which allow any pair of disparate types of variables to be coupled through their cumulative distribution function, while maintaining entirely the separate individual marginal distributions appropriate for each species. A Gaussian copula smoothly captures changes in an index of association that excludes joint absences in the space of the original species variables. A permutation‐based filter with exact family‐wise error can optionally be used a priori to reduce the dimensionality of the copula estimation problem. We describe in detail a Monte Carlo expectation maximization algorithm for efficient estimation of the copula correlation matrix with discrete marginal distributions (counts). The resulting fully parameterized copula models can be used to simulate realistic ecological community data under fully specified null or alternative hypotheses. Distributions of community centroids derived from simulated data can then be visualized in ordinations of ecologically meaningful dissimilarity spaces. Multinomial mixtures of data drawn from copula models also yield smooth power curves in dissimilarity‐based settings. Our proposed analysis pathway provides new opportunities to combine model‐based approaches with dissimilarity‐based methods to enhance understanding of ecological systems. We demonstrate implementation of the pathway through an ecological example, where associations among fish species were found to increase after the establishment of a marine reserve.  相似文献   

17.
The Newman-Keuls (NK) procedure for testing all pairwise comparisons among a set of treatment means, introduced by Newman (1939) and in a slightly different form by Keuls (1952) was proposed as a reasonable way to alleviate the inflation of error rates when a large number of means are compared. It was proposed before the concepts of different types of multiple error rates were introduced by Tukey (1952a, b; 1953). Although it was popular in the 1950s and 1960s, once control of the familywise error rate (FWER) was accepted generally as an appropriate criterion in multiple testing, and it was realized that the NK procedure does not control the FWER at the nominal level at which it is performed, the procedure gradually fell out of favor. Recently, a more liberal criterion, control of the false discovery rate (FDR), has been proposed as more appropriate in some situations than FWER control. This paper notes that the NK procedure and a nonparametric extension controls the FWER within any set of homogeneous treatments. It proves that the extended procedure controls the FDR when there are well-separated clusters of homogeneous means and between-cluster test statistics are independent, and extensive simulation provides strong evidence that the original procedure controls the FDR under the same conditions and some dependent conditions when the clusters are not well-separated. Thus, the test has two desirable error-controlling properties, providing a compromise between FDR control with no subgroup FWER control and global FWER control. Yekutieli (2002) developed an FDR-controlling procedure for testing all pairwise differences among means, without any FWER-controlling criteria when there is more than one cluster. The empirica example in Yekutieli's paper was used to compare the Benjamini-Hochberg (1995) method with apparent FDR control in this context, Yekutieli's proposed method with proven FDR control, the Newman-Keuls method that controls FWER within equal clusters with apparent FDR control, and several methods that control FWER globally. The Newman-Keuls is shown to be intermediate in number of rejections to the FWER-controlling methods and the FDR-controlling methods in this example, although it is not always more conservative than the other FDR-controlling methods.  相似文献   

18.
Predicting protein structure from primary sequence is one of the ultimate challenges in computational biology. Given the large amount of available sequence data, the analysis of co-evolution, i.e., statistical dependency, between columns in multiple alignments of protein domain sequences remains one of the most promising avenues for predicting residues that are contacting in the structure. A key impediment to this approach is that strong statistical dependencies are also observed for many residue pairs that are distal in the structure. Using a comprehensive analysis of protein domains with available three-dimensional structures we show that co-evolving contacts very commonly form chains that percolate through the protein structure, inducing indirect statistical dependencies between many distal pairs of residues. We characterize the distributions of length and spatial distance traveled by these co-evolving contact chains and show that they explain a large fraction of observed statistical dependencies between structurally distal pairs. We adapt a recently developed Bayesian network model into a rigorous procedure for disentangling direct from indirect statistical dependencies, and we demonstrate that this method not only successfully accomplishes this task, but also allows contacts with weak statistical dependency to be detected. To illustrate how additional information can be incorporated into our method, we incorporate a phylogenetic correction, and we develop an informative prior that takes into account that the probability for a pair of residues to contact depends strongly on their primary-sequence distance and the amount of conservation that the corresponding columns in the multiple alignment exhibit. We show that our model including these extensions dramatically improves the accuracy of contact prediction from multiple sequence alignments.  相似文献   

19.
20.
MOTIVATION: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when the number of tested genes gets large. Correlation between the test statistics attributed to gene co-regulation and dependency in the measurement errors of the gene expression levels further complicates the problem. In this paper we address this very large multiplicity problem by adopting the false discovery rate (FDR) controlling approach. In order to address the dependency problem, we present three resampling-based FDR controlling procedures, that account for the test statistics distribution, and compare their performance to that of the na?ve application of the linear step-up procedure in Benjamini and Hochberg (1995). The procedures are studied using simulated microarray data, and their performance is examined relative to their ease of implementation. RESULTS: Comparative simulation analysis shows that all four FDR controlling procedures control the FDR at the desired level, and retain substantially more power then the family-wise error rate controlling procedures. In terms of power, using resampling of the marginal distribution of each test statistics substantially improves the performance over the na?ve one. The highest power is achieved, at the expense of a more sophisticated algorithm, by the resampling-based procedures that resample the joint distribution of the test statistics and estimate the level of FDR control. AVAILABILITY: An R program that adjusts p-values using FDR controlling procedures is freely available over the Internet at www.math.tau.ac.il/~ybenja.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号