期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Graphical approaches for multiple comparison procedures using weighted Bonferroni, Simes, or parametric tests

Bretz F Posch M Glimm E Klinglmueller F Maurer W Rohmeyer K 《Biometrical journal. Biometrische Zeitschrift》2011,53(6):894-913

The confirmatory analysis of pre-specified multiple hypotheses has become common in pivotal clinical trials. In the recent past multiple test procedures have been developed that reflect the relative importance of different study objectives, such as fixed sequence, fallback, and gatekeeping procedures. In addition, graphical approaches have been proposed that facilitate the visualization and communication of Bonferroni-based closed test procedures for common multiple test problems, such as comparing several treatments with a control, assessing the benefit of a new drug for more than one endpoint, combined non-inferiority and superiority testing, or testing a treatment at different dose levels in an overall and a subpopulation. In this paper, we focus on extended graphical approaches by dissociating the underlying weighting strategy from the employed test procedure. This allows one to first derive suitable weighting strategies that reflect the given study objectives and subsequently apply appropriate test procedures, such as weighted Bonferroni tests, weighted parametric tests accounting for the correlation between the test statistics, or weighted Simes tests. We illustrate the extended graphical approaches with several examples. In addition, we describe briefly the gMCP package in R, which implements some of the methods described in this paper. 相似文献

2.

Multiple testing to establish superiority/equivalence of a new treatment compared with k standard treatments for unbalanced designs

Kwong KS Cheung SH Chan WS 《Biometrics》2004,60(2):491-498

In clinical studies, multiple superiority/equivalence testing procedures can be applied to classify a new treatment as superior, equivalent (same therapeutic effect), or inferior to each set of standard treatments. Previous stepwise approaches (Dunnett and Tamhane, 1997, Statistics in Medicine16, 2489-2506; Kwong, 2001, Journal of Statistical Planning and Inference 97, 359-366) are only appropriate for balanced designs. Unfortunately, the construction of similar tests for unbalanced designs is far more complex, with two major difficulties: (i) the ordering of test statistics for superiority may not be the same as the ordering of test statistics for equivalence; and (ii) the correlation structure of the test statistics is not equi-correlated but product-correlated. In this article, we seek to develop a two-stage testing procedure for unbalanced designs, which are very popular in clinical experiments. This procedure is a combination of step-up and single-step testing procedures, while the familywise error rate is proved to be controlled at a designated level. Furthermore, a simulation study is conducted to compare the average powers of the proposed procedure to those of the single-step procedure. In addition, a clinical example is provided to illustrate the application of the new procedure. 相似文献

3.

Efficient exact p-value computation for small sample, sparse, and surprising categorical data.

Gill Bejerano Nir Friedman Naftali Tishby 《Journal of computational biology》2004,11(5):867-886

A major obstacle in applying various hypothesis testing procedures to datasets in bioinformatics is the computation of ensuing p-values. In this paper, we define a generic branch-and-bound approach to efficient exact p-value computation and enumerate the required conditions for successful application. Explicit procedures are developed for the entire Cressie-Read family of statistics, which includes the widely used Pearson and likelihood ratio statistics in a one-way frequency table goodness-of-fit test. This new formulation constitutes a first practical exact improvement over the exhaustive enumeration performed by existing statistical software. The general techniques we develop to exploit the convexity of many statistics are also shown to carry over to contingency table tests, suggesting that they are readily extendible to other tests and test statistics of interest. Our empirical results demonstrate a speed-up of orders of magnitude over the exhaustive computation, significantly extending the practical range for performing exact tests. We also show that the relative speed-up gain increases as the null hypothesis becomes sparser, that computation precision increases with increase in speed-up, and that computation time is very moderately affected by the magnitude of the computed p-value. These qualities make our algorithm especially appealing in the regimes of small samples, sparse null distributions, and rare events, compared to the alternative asymptotic approximations and Monte Carlo samplers. We discuss several established bioinformatics applications, where small sample size, small expected counts in one or more categories (sparseness), and very small p-values do occur. Our computational framework could be applied in these, and similar cases, to improve performance. 相似文献

4.

Statistical power when testing for genetic differentiation

Ryman N Jorde PE 《Molecular ecology》2001,10(10):2361-2373

A variety of statistical procedures are commonly employed when testing for genetic differentiation. In a typical situation two or more samples of individuals have been genotyped at several gene loci by molecular or biochemical means, and in a first step a statistical test for allele frequency homogeneity is performed at each locus separately, using, e.g. the contingency chi-square test, Fisher's exact test, or some modification thereof. In a second step the results from the separate tests are combined for evaluation of the joint null hypothesis that there is no allele frequency difference at any locus, corresponding to the important case where the samples would be regarded as drawn from the same statistical and, hence, biological population. Presently, there are two conceptually different strategies in use for testing the joint null hypothesis of no difference at any locus. One approach is based on the summation of chi-square statistics over loci. Another method is employed by investigators applying the Bonferroni technique (adjusting the P-value required for rejection to account for the elevated alpha errors when performing multiple tests simultaneously) to test if the heterogeneity observed at any particular locus can be regarded significant when considered separately. Under this approach the joint null hypothesis is rejected if one or more of the component single locus tests is considered significant under the Bonferroni criterion. We used computer simulations to evaluate the statistical power and realized alpha errors of these strategies when evaluating the joint hypothesis after scoring multiple loci. We find that the 'extended' Bonferroni approach generally is associated with low statistical power and should not be applied in the current setting. Further, and contrary to what might be expected, we find that 'exact' tests typically behave poorly when combined in existing procedures for joint hypothesis testing. Thus, while exact tests are generally to be preferred over approximate ones when testing each particular locus, approximate tests such as the traditional chi-square seem preferable when addressing the joint hypothesis. 相似文献

5.

Testing for group effect in a 2 x k heteroscedastic ANOVA model

Troendle JF 《Biometrical journal. Biometrische Zeitschrift》2008,50(4):571-583

It is natural to want to relax the assumption of homoscedasticity and Gaussian error in ANOVA models. For a two-way ANOVA model with 2 x k cells, one can derive tests of main effect for the factor with two levels (referred to as group) without assuming homoscedasticity or Gaussian error. Empirical likelihood can be used to derive testing procedures. An approximate empirical likelihood ratio test (AELRT) is derived for the test of group main effect. To approximate the distributions of the test statistics under the null hypothesis, simulation from the approximate empirical maximum likelihood estimate (AEMLE) restricted by the null hypothesis is used. The homoscedastic ANOVA F -test and a Box-type approximation to the distribution of the heteroscedastic ANOVA F -test are compared to the AELRT in level and power. The AELRT procedure is shown by simulation to have appropriate type I error control (although possibly conservative) when the distribution of the test statistics are approximated by simulation from the constrained AEMLE. The methodology is motivated and illustrated by an analysis of folate levels in the blood among two alcohol intake groups while accounting for gender. 相似文献

6.

On the power of some binomial modifications of the Bonferroni multiple test

Teriokhin AT de Meeûs T Guégan JF 《Zhurnal obshche? biologii》2007,68(5):332-340

Widely used in testing statistical hypotheses, the Bonferroni multiple test has a rather low power that entails a high risk to accept falsely the overall null hypothesis and therefore to not detect really existing effects. We suggest that when the partial test statistics are statistically independent, it is possible to reduce this risk by using binomial modifications of the Bonferroni test. Instead of rejecting the null hypothesis when at least one of n partial null hypotheses is rejected at a very high level of significance (say, 0.005 in the case of n = 10), as it is prescribed by the Bonferroni test, the binomial tests recommend to reject the null hypothesis when at least k partial null hypotheses (say, k = [n/2]) are rejected at much lower level (up to 30-50%). We show that the power of such binomial tests is essentially higher as compared with the power of the original Bonferroni and some modified Bonferroni tests. In addition, such an approach allows us to combine tests for which the results are known only for a fixed significance level. The paper contains tables and a computer program which allow to determine (retrieve from a table or to compute) the necessary binomial test parameters, i.e. either the partial significance level (when k is fixed) or the value of k (when the partial significance level is fixed). 相似文献

7.

The use of score tests for inference on variance components 总被引：4，自引：0，他引：4

Verbeke G Molenberghs G 《Biometrics》2003,59(2):254-262

Whenever inference for variance components is required, the choice between one-sided and two-sided tests is crucial. This choice is usually driven by whether or not negative variance components are permitted. For two-sided tests, classical inferential procedures can be followed, based on likelihood ratios, score statistics, or Wald statistics. For one-sided tests, however, one-sided test statistics need to be developed, and their null distribution derived. While this has received considerable attention in the context of the likelihood ratio test, there appears to be much confusion about the related problem for the score test. The aim of this paper is to illustrate that classical (two-sided) score test statistics, frequently advocated in practice, cannot be used in this context, but that well-chosen one-sided counterparts could be used instead. The relation with likelihood ratio tests will be established, and all results are illustrated in an analysis of continuous longitudinal data using linear mixed models. 相似文献

8.

Procedures for two-sample comparisons with multiple endpoints controlling the experimentwise error rate 总被引：3，自引：0，他引：3

W Lehmacher G Wassmer P Reitmeir 《Biometrics》1991,47(2):511-521

Clinical trials are often concerned with the comparison of two treatment groups with multiple endpoints. As alternatives to the commonly used methods, the T2 test and the Bonferroni method, O'Brien (1984, Biometrics 40, 1079-1087) proposes tests based on statistics that are simple or weighted sums of the single endpoints. This approach turns out to be powerful if all treatment differences are in the same direction [compare Pocock, Geller, and Tsiatis (1987, Biometrics 43, 487-498)]. The disadvantage of these multivariate methods is that they are suitable only for demonstrating a global difference, whereas the clinician is further interested in which specific endpoints or sets of endpoints actually caused this difference. It is shown here that all tests are suitable for the construction of a closed multiple test procedure where, after the rejection of the global hypothesis, all lower-dimensional marginal hypotheses and finally the single hypotheses are tested step by step. This procedure controls the experimentwise error rate. It is just as powerful as the multivariate test and, in addition, it is possible to detect significant differences between the endpoints or sets of endpoints. 相似文献

9.

Tests for linkage and association in nuclear families. 总被引：12，自引：4，他引：8

下载免费PDF全文

E R Martin N L Kaplan B S Weir 《American journal of human genetics》1997,61(2):439-448

The transmission/disequilibrium test (TDT) originally was introduced to test for linkage between a genetic marker and a disease-susceptibility locus, in the presence of association. Recently, the TDT has been used to test for association in the presence of linkage. The motivation for this is that linkage analysis typically identifies large candidate regions, and further refinement is necessary before a search for the disease gene is begun, on the molecular level. Evidence of association and linkage may indicate which markers in the region are closest to a disease locus. As a test of linkage, transmissions from heterozygous parents to all of their affected children can be included in the TDT; however, the TDT is a valid chi2 test of association only if transmissions to unrelated affected children are used in the analysis. If the sample contains independent nuclear families with multiple affected children, then one procedure that has been used to test for association is to select randomly a single affected child from each sibship and to apply the TDT to those data. As an alternative, we propose two statistics that use data from all of the affected children. The statistics give valid chi2 tests of the null hypothesis of no association or no linkage and generally are more powerful than the TDT with a single, randomly chosen, affected child from each family. 相似文献

10.

A modified Bonferroni method for discrete data 总被引：5，自引：1，他引：4

R E Tarone 《Biometrics》1990,46(2):515-522

The Bonferroni adjustment for multiple comparisons is a simple and useful method of controlling the overall false positive error rate when several significance tests are performed in the evaluation of an experiment. In situations with categorical data, the test statistics have discrete distributions. The discreteness of the null distributions can be exploited to reduce the number of significance tests taken into account in the Bonferroni procedure. This reduction is accomplished by using only the information contained in the marginal totals. 相似文献

11.

Multiple Comparisons of Polynomial Distributions

Th. Royen 《Biometrical journal. Biometrische Zeitschrift》1984,26(3):319-332

Asymptotically correct 90 and 95 percentage points are given for multiple comparisons with control and for all pair comparisons of several independent samples of equal size from polynomial distributions. Test statistics are the maxima of the X²-statistics for single comparisons. For only two categories the asymptotic distributions of these test statistics result from DUNNETT'S many-one tests and TUKEY'S range test (cf. MILLER, 1981). The percentage points for comparisons with control are computed from the limit distribution of the test statistic under the overall hypothesis H₀. To some extent the applicability of these bounds is investigated by simulation. The bounds can also be used to improve Holm's sequentially rejective Bonferroni test procedure (cf. HOLM, 1979). The percentage points for all pair comparisons are obtained by large simulations. Especially for 3×3-tables the limit distribution of the test statistic under H₀ is derived also for samples of unequal size. Also these bounds can improve the corresponding Bonferroni-Holm procedure. Finally from SKIDÁK's probability inequality for normal random vectors (cf. SKIDÁK, 1967) a similar inequality is derived for dependent X²-variables applicable to simultaneous X²-tests. 相似文献

12.

Multiple comparisons: philosophies and illustrations

Curran-Everett D 《American journal of physiology. Regulatory, integrative and comparative physiology》2000,279(1):R1-R8

Statistical procedures underpin the process of scientific discovery. As researchers, one way we use these procedures is to test the validity of a null hypothesis. Often, we test the validity of more than one null hypothesis. If we fail to use an appropriate procedure to account for this multiplicity, then we are more likely to reach a wrong scientific conclusion-we are more likely to make a mistake. In physiology, experiments that involve multiple comparisons are common: of the original articles published in 1997 by the American Physiological Society, approximately 40% cite a multiple comparison procedure. In this review, I demonstrate the statistical issue embedded in multiple comparisons, and I summarize the philosophies of handling this issue. I also illustrate the three procedures-Newman-Keuls, Bonferroni, least significant difference-cited most often in my literature review; each of these procedures is of limited practical value. Last, I demonstrate the false discovery rate procedure, a promising development in multiple comparisons. The false discovery rate procedure may be the best practical solution to the problems of multiple comparisons that exist within physiology and other scientific disciplines. 相似文献

13.

Unconditional Exact Tests for Equivalence or Noninferiority for Paired Binary Endpoints 总被引：2，自引：0，他引：2

Huey-Miin Hsueh Jen-Pei Liu James J. Chen 《Biometrics》2001,57(2):478-483

Problems of establishing equivalence or noninferiority between two medical diagnostic procedures involve comparisons of the response rates between correlated proportions. When the sample size is small, the asymptotic tests may not be reliable. This article proposes an unconditional exact test procedure to assess equivalence or noninferiority. Two statistics, a sample-based test statistic and a restricted maximum likelihood estimation (RMLE)-based test statistic, to define the rejection region of the exact test are considered. We show the p-value of the proposed unconditional exact tests can be attained at the boundary point of the null hypothesis. Assessment of equivalence is often based on a comparison of the confidence limits with the equivalence limits. We also derive the unconditional exact confidence intervals on the difference of the two proportion means for the two test statistics. A typical data set of comparing two diagnostic procedures is analyzed using the proposed unconditional exact and asymptotic methods. The p-value from the unconditional exact tests is generally larger than the p-value from the asymptotic tests. In other words, an exact confidence interval is generally wider than the confidence interval obtained from an asymptotic test. 相似文献

14.

Sibling-based tests of linkage and association for quantitative traits. 总被引：6，自引：0，他引：6

下载免费PDF全文

D B Allison M Heo N Kaplan E R Martin 《American journal of human genetics》1999,64(6):1754-1763

The transmission/disequilibrium test (TDT) developed by Spielman et al. can be a powerful family-based test of linkage and, in some cases, a test of association as well as linkage. It has recently been extended in several ways; these include allowance for implementation with quantitative traits, allowance for multiple alleles, and, in the case of dichotomous traits, allowance for testing in the absence of parental data. In this article, these three extensions are combined, and two procedures are developed that offer valid joint tests of linkage and (in the case of certain sibling configurations) association with quantitative traits, with use of data from siblings only, and that can accommodate biallelic or multiallelic loci. The first procedure uses a mixed-effects (i.e., random and fixed effects) analysis of variance in which sibship is the random factor, marker genotype is the fixed factor, and the continuous phenotype is the dependent variable. Covariates can easily be accommodated, and the procedure can be implemented in commonly available statistical software. The second procedure is a permutation-based procedure. Selected power studies are conducted to illustrate the relative power of each test under a variety of circumstances. 相似文献

15.

Testing hypotheses about interclass correlations from familial data 总被引：1，自引：0，他引：1

S Konishi 《Biometrics》1985,41(1):167-176

Testing problems concerning interclass correlations from familial data are considered in the case where the number of siblings varies among families. Under the assumption of multivariate normality, two test procedures are proposed for testing the hypothesis that an interclass correlation is equal to a specified value. To compare the properties of the tests, including a likelihood ratio test, Monte Carlo experiments are performed. Several test statistics are derived for testing whether two variables about a parent and child are uncorrelated. The proposed tests are compared with previous test procedures, using Monte Carlo simulation. A general procedure for finding confidence intervals for interclass correlations is also derived. 相似文献

16.

An efficient Monte Carlo approach to assessing statistical significance in genomic studies 总被引：3，自引：0，他引：3

Lin DY 《Bioinformatics (Oxford, England)》2005,21(6):781-787

MOTIVATION: Multiple hypothesis testing is a common problem in genome research, particularly in microarray experiments and genomewide association studies. Failure to account for the effects of multiple comparisons would result in an abundance of false positive results. The Bonferroni correction and Holm's step-down procedure are overly conservative, whereas the permutation test is time-consuming and is restricted to simple problems. RESULTS: We developed an efficient Monte Carlo approach to approximating the joint distribution of the test statistics along the genome. We then used the Monte Carlo distribution to evaluate the commonly used criteria for error control, such as familywise error rates and positive false discovery rates. This approach is applicable to any data structures and test statistics. Applications to simulated and real data demonstrate that the proposed approach provides accurate error control, and can be substantially more powerful than the Bonferroni and Holm methods, especially when the test statistics are highly correlated. 相似文献

17.

A New Overall-Subgroup Simultaneous Test for Optimal Inference in Biomarker-Targeted Confirmatory Trials

Ilana Belitskaya-Lévy Hui Wang Mei-Chiung Shih Lu Tian Gheorghe Doros Robert A. Lew Ying Lu 《Statistics in biosciences》2018,10(2):297-323

We propose a joint hypothesis test for simultaneous confirmatory inference in the overall population and a pre-defined marker-positive subgroup under the assumption that the treatment effect in the marker-positive subgroup is larger than that in the overall population. The proposed confirmatory overall-subgroup simultaneous test (COSST) is based on partitioning the sample space of the test statistics in the marker-positive and marker-negative subgroups. We define two rejection regions in the joint sample space of the two test statistics: (1) efficacy in the marker-positive subgroup only; (2) efficacy in the overall population. COSST achieves higher statistical power to detect the overall and subgroup efficacy than most sequential procedures while controlling the family-wise type I error rate. COSST also takes into account the potentially harmful effect in the subgroups in the decision. The optimal rejection regions depend on the specific alternative hypothesis and the sample size. COSST can be useful for Phase III clinical trials with tailoring objectives. 相似文献

18.

Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models

Ruzong Fan Yifan Wang Michael Boehnke Wei Chen Yun Li Haobo Ren Iryna Lobach Momiao Xiong 《Genetics》2015,200(4):1089-1104

Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. 相似文献

19.

Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites

Anisimova M Yang Z 《Molecular biology and evolution》2007,24(5):1219-1228

Detection of positive Darwinian selection has become ever more important with the rapid growth of genomic data sets. Recent branch-site models of codon substitution account for variation of selective pressure over branches on the tree and across sites in the sequence and provide a means to detect short episodes of molecular adaptation affecting just a few sites. In likelihood ratio tests based on such models, the branches to be tested for positive selection have to be specified a priori. In the absence of a biological hypothesis to designate so-called foreground branches, one may test many branches, but a correction for multiple testing becomes necessary. In this paper, we employ computer simulation to evaluate the performance of 6 multiple test correction procedures when the branch-site models are used to test every branch on the phylogeny for positive selection. Four of the methods control the familywise error rates (FWERs), whereas the other 2 control the false discovery rate (FDR). We found that all correction procedures achieved acceptable FWER except for extremely divergent sequences and serious model violations, when the test may become unreliable. The power of the test to detect positive selection is influenced by the strength of selection and the sequence divergence, with the highest power observed at intermediate divergences. The 4 correction procedures that control the FWER had similar power. We recommend Rom's procedure for its slightly higher power, but the simple Bonferroni correction is useable as well. The 2 correction procedures that control the FDR had slightly more power and also higher FWER. We demonstrate the multiple test procedures by analyzing gene sequences from the extracellular domain of the cluster of differentiation 2 (CD2) gene from 10 mammalian species. Both our simulation and real data analysis suggest that the multiple test procedures are useful when multiple branches have to be tested on the same data set. 相似文献

20.

Optimal multiple testing and design in clinical trials

Ruth Heller Abba Krieger Saharon Rosset 《Biometrics》2023,79(3):1908-1919

A central goal in designing clinical trials is to find the test that maximizes power (or equivalently minimizes required sample size) for finding a false null hypothesis subject to the constraint of type I error. When there is more than one test, such as in clinical trials with multiple endpoints, the issues of optimal design and optimal procedures become more complex. In this paper, we address the question of how such optimal tests should be defined and how they can be found. We review different notions of power and how they relate to study goals, and also consider the requirements of type I error control and the nature of the procedures. This leads us to an explicit optimization problem with objective and constraints that describe its specific desiderata. We present a complete solution for deriving optimal procedures for two hypotheses, which have desired monotonicity properties, and are computationally simple. For some of the optimization formulations this yields optimal procedures that are identical to existing procedures, such as Hommel's procedure or the procedure of Bittman et al. (2009), while for other cases it yields completely novel and more powerful procedures than existing ones. We demonstrate the nature of our novel procedures and their improved power extensively in a simulation and on the APEX study (Cohen et al., 2016). 相似文献