首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Permutation test is a popular technique for testing a hypothesis of no effect, when the distribution of the test statistic is unknown. To test the equality of two means, a permutation test might use a test statistic which is the difference of the two sample means in the univariate case. In the multivariate case, it might use a test statistic which is the maximum of the univariate test statistics. A permutation test then estimates the null distribution of the test statistic by permuting the observations between the two samples. We will show that, for such tests, if the two distributions are not identical (as for example when they have unequal variances, correlations or skewness), then a permutation test for equality of means based on difference of sample means can have an inflated Type I error rate even when the means are equal. Our results illustrate permutation testing should be confined to testing for non-identical distributions. CONTACT: calian@raunvis.hi.is.  相似文献   

2.
Valid inference in random effects meta-analysis   总被引:2,自引:0,他引:2  
The standard approach to inference for random effects meta-analysis relies on approximating the null distribution of a test statistic by a standard normal distribution. This approximation is asymptotic on k, the number of studies, and can be substantially in error in medical meta-analyses, which often have only a few studies. This paper proposes permutation and ad hoc methods for testing with the random effects model. Under the group permutation method, we randomly switch the treatment and control group labels in each trial. This idea is similar to using a permutation distribution for a community intervention trial where communities are randomized in pairs. The permutation method theoretically controls the type I error rate for typical meta-analyses scenarios. We also suggest two ad hoc procedures. Our first suggestion is to use a t-reference distribution with k-1 degrees of freedom rather than a standard normal distribution for the usual random effects test statistic. We also investigate the use of a simple t-statistic on the reported treatment effects.  相似文献   

3.
A class of nonparametric statistical methods, including a nonparametric empirical Bayes (EB) method, the Significance Analysis of Microarrays (SAM) and the mixture model method (MMM) have been proposed to detect differential gene expression for replicated microarray experiments. They all depend on constructing a test statistic, for example, a t-statistic, and then using permutation to draw inferences. However, due to special features of microarray data, using standard permutation scores may not estimate the null distribution of the test statistic well, leading to possibly too conservative inferences. We propose a new method of constructing weighted permutation scores to overcome the problem: posterior probabilities of having no differential expression from the EB method are used as weights for genes to better estimate the null distribution of the test statistic. We also propose a weighted method to estimate the false discovery rate (FDR) using the posterior probabilities. Using simulated data and real data for time-course microarray experiments, we show the improved performance of the proposed methods when implemented in MMM, EB and SAM.  相似文献   

4.
Lee OE  Braun TM 《Biometrics》2012,68(2):486-493
Inference regarding the inclusion or exclusion of random effects in linear mixed models is challenging because the variance components are located on the boundary of their parameter space under the usual null hypothesis. As a result, the asymptotic null distribution of the Wald, score, and likelihood ratio tests will not have the typical χ(2) distribution. Although it has been proved that the correct asymptotic distribution is a mixture of χ(2) distributions, the appropriate mixture distribution is rather cumbersome and nonintuitive when the null and alternative hypotheses differ by more than one random effect. As alternatives, we present two permutation tests, one that is based on the best linear unbiased predictors and one that is based on the restricted likelihood ratio test statistic. Both methods involve weighted residuals, with the weights determined by the among- and within-subject variance components. The null permutation distributions of our statistics are computed by permuting the residuals both within and among subjects and are valid both asymptotically and in small samples. We examine the size and power of our tests via simulation under a variety of settings and apply our test to a published data set of chronic myelogenous leukemia patients.  相似文献   

5.
In many applications of generalized linear mixed models to multilevel data, it is of interest to test whether a random effects variance component is zero. It is well known that the usual asymptotic chi-square distribution of the likelihood ratio and score statistics under the null does not necessarily hold. In this note we propose a permutation test, based on randomly permuting the indices associated with a given level of the model, that has the correct Type I error rate under the null. Results from a simulation study suggest that it is more powerful than tests based on mixtures of chi-square distributions. The proposed test is illustrated using data on the familial aggregation of sleep disturbance.  相似文献   

6.

Background  

Time-course microarray experiments are widely used to study the temporal profiles of gene expression. Storey et al. (2005) developed a method for analyzing time-course microarray studies that can be applied to discovering genes whose expression trajectories change over time within a single biological group, or those that follow different time trajectories among multiple groups. They estimated the expression trajectories of each gene using natural cubic splines under the null (no time-course) and alternative (time-course) hypotheses, and used a goodness of fit test statistic to quantify the discrepancy. The null distribution of the statistic was approximated through a bootstrap method. Gene expression levels in microarray data are often complicatedly correlated. An accurate type I error control adjusting for multiple testing requires the joint null distribution of test statistics for a large number of genes. For this purpose, permutation methods have been widely used because of computational ease and their intuitive interpretation.  相似文献   

7.
Permutation tests are amongst the most commonly used statistical tools in modern genomic research, a process by which p-values are attached to a test statistic by randomly permuting the sample or gene labels. Yet permutation p-values published in the genomic literature are often computed incorrectly, understated by about 1/m, where m is the number of permutations. The same is often true in the more general situation when Monte Carlo simulation is used to assign p-values. Although the p-value understatement is usually small in absolute terms, the implications can be serious in a multiple testing context. The understatement arises from the intuitive but mistaken idea of using permutation to estimate the tail probability of the test statistic. We argue instead that permutation should be viewed as generating an exact discrete null distribution. The relevant literature, some of which is likely to have been relatively inaccessible to the genomic community, is reviewed and summarized. A computation strategy is developed for exact p-values when permutations are randomly drawn. The strategy is valid for any number of permutations and samples. Some simple recommendations are made for the implementation of permutation tests in practice.  相似文献   

8.
MOTIVATION: The parametric F-test has been widely used in the analysis of factorial microarray experiments to assess treatment effects. However, the normality assumption is often untenable for microarray experiments with small replications. Therefore, permutation-based methods are called for help to assess the statistical significance. The distribution of the F-statistics across all the genes on the array can be regarded as a mixture distribution with a proportion of statistics generated from the null distribution of no differential gene expression whereas the other proportion of statistics generated from the alternative distribution of genes differentially expressed. This results in the fact that the permutation distribution of the F-statistics may not approximate well to the true null distribution of the F-statistics. Therefore, the construction of a proper null statistic to better approximate the null distribution of F-statistic is of great importance to the permutation-based multiple testing in microarray data analysis. RESULTS: In this paper, we extend the ideas of constructing null statistics based on pairwise differences to neglect the treatment effects from the two-sample comparison problem to the multifactorial balanced or unbalanced microarray experiments. A null statistic based on a subpartition method is proposed and its distribution is employed to approximate the null distribution of the F-statistic. The proposed null statistic is able to accommodate unbalance in the design and is also corrected for the undue correlation between its numerator and denominator. In the simulation studies and real biological data analysis, the number of true positives and the false discovery rate (FDR) of the proposed null statistic are compared with those of the permutated version of the F-statistic. It has been shown that our proposed method has a better control of the FDRs and a higher power than the standard permutation method to detect differentially expressed genes because of the better approximated tail probabilities.  相似文献   

9.
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.  相似文献   

10.
Count data are common endpoints in clinical trials, for example magnetic resonance imaging lesion counts in multiple sclerosis. They often exhibit high levels of overdispersion, that is variances are larger than the means. Inference is regularly based on negative binomial regression along with maximum‐likelihood estimators. Although this approach can account for heterogeneity it postulates a common overdispersion parameter across groups. Such parametric assumptions are usually difficult to verify, especially in small trials. Therefore, novel procedures that are based on asymptotic results for newly developed rate and variance estimators are proposed in a general framework. Moreover, in case of small samples the procedures are carried out using permutation techniques. Here, the usual assumption of exchangeability under the null hypothesis is not met due to varying follow‐up times and unequal overdispersion parameters. This problem is solved by the use of studentized permutations leading to valid inference methods for situations with (i) varying follow‐up times, (ii) different overdispersion parameters, and (iii) small sample sizes.  相似文献   

11.
Estimating p-values in small microarray experiments   总被引:5,自引:0,他引:5  
MOTIVATION: Microarray data typically have small numbers of observations per gene, which can result in low power for statistical tests. Test statistics that borrow information from data across all of the genes can improve power, but these statistics have non-standard distributions, and their significance must be assessed using permutation analysis. When sample sizes are small, the number of distinct permutations can be severely limited, and pooling the permutation-derived test statistics across all genes has been proposed. However, the null distribution of the test statistics under permutation is not the same for equally and differentially expressed genes. This can have a negative impact on both p-value estimation and the power of information borrowing statistics. RESULTS: We investigate permutation based methods for estimating p-values. One of methods that uses pooling from a selected subset of the data are shown to have the correct type I error rate and to provide accurate estimates of the false discovery rate (FDR). We provide guidelines to select an appropriate subset. We also demonstrate that information borrowing statistics have substantially increased power compared to the t-test in small experiments.  相似文献   

12.
Abstract — Commonly used permutation tail probability (PTP) and topology dependent permutation tail probability (T-PTP) tests incorporate an inappropriate treatment of designated outgroup taxa, and for that reason are biased either for (PTP) or for or against (T-PTP) rejection of the null hypothesis. A modified test is proposed, in which this source of bias is eliminated.  相似文献   

13.
The initial presentation of multifactor dimensionality reduction (MDR) featured cross-validation to mitigate over-fitting, computationally efficient searches of the epistatic model space, and variable construction with constructive induction to alleviate the curse of dimensionality. However, the method was unable to differentiate association signals arising from true interactions from those due to independent main effects at individual loci. This issue leads to problems in inference and interpretability for the results from MDR and the family-based compliment the MDR-pedigree disequilibrium test (PDT). A suggestion from previous work was to fit regression models post hoc to specifically evaluate the null hypothesis of no interaction for MDR or MDR-PDT models. We demonstrate with simulation that fitting a regression model on the same data as that analyzed by MDR or MDR-PDT is not a valid test of interaction. This is likely to be true for any other procedure that searches for models, and then performs an uncorrected test for interaction. We also show with simulation that when strong main effects are present and the null hypothesis of no interaction is true, that MDR and MDR-PDT reject at far greater than the nominal rate. We also provide a valid regression-based permutation test procedure that specifically tests the null hypothesis of no interaction, and does not reject the null when only main effects are present. The regression-based permutation test implemented here conducts a valid test of interaction after a search for multilocus models, and can be applied to any method that conducts a search to find a multilocus model representing an interaction.  相似文献   

14.
Summary Genomic instability, such as copy‐number losses and gains, occurs in many genetic diseases. Recent technology developments enable researchers to measure copy numbers at tens of thousands of markers simultaneously. In this article, we propose a nonparametric approach for detecting the locations of copy‐number changes and provide a measure of significance for each change point. The proposed test is based on seeking scale‐based changes in the sequence of copy numbers, which is ordered by the marker locations along the chromosome. The method leads to a natural way to estimate the null distribution for the test of a change point and adjusted p‐values for the significance of a change point using a step‐down maxT permutation algorithm to control the family‐wise error rate. A simulation study investigates the finite sample performance of the proposed method and compares it with a more standard sequential testing method. The method is illustrated using two real data sets.  相似文献   

15.
Large exploratory studies, including candidate-gene-association testing, genomewide linkage-disequilibrium scans, and array-expression experiments, are becoming increasingly common. A serious problem for such studies is that statistical power is compromised by the need to control the false-positive rate for a large family of tests. Because multiple true associations are anticipated, methods have been proposed that combine evidence from the most significant tests, as a more powerful alternative to individually adjusted tests. The practical application of these methods is currently limited by a reliance on permutation testing to account for the correlated nature of single-nucleotide polymorphism (SNP)-association data. On a genomewide scale, this is both very time-consuming and impractical for repeated explorations with standard marker panels. Here, we alleviate these problems by fitting analytic distributions to the empirical distribution of combined evidence. We fit extreme-value distributions for fixed lengths of combined evidence and a beta distribution for the most significant length. An initial phase of permutation sampling is required to fit these distributions, but it can be completed more quickly than a simple permutation test and need be done only once for each panel of tests, after which the fitted parameters give a reusable calibration of the panel. Our approach is also a more efficient alternative to a standard permutation test. We demonstrate the accuracy of our approach and compare its efficiency with that of permutation tests on genomewide SNP data released by the International HapMap Consortium. The estimation of analytic distributions for combined evidence will allow these powerful methods to be applied more widely in large exploratory studies.  相似文献   

16.
Summary .  Many assessment instruments used in the evaluation of toxicity, safety, pain, or disease progression consider multiple ordinal endpoints to fully capture the presence and severity of treatment effects. Contingency tables underlying these correlated responses are often sparse and imbalanced, rendering asymptotic results unreliable or model fitting prohibitively complex without overly simplistic assumptions on the marginal and joint distribution. Instead of a modeling approach, we look at stochastic order and marginal inhomogeneity as an expression or manifestation of a treatment effect under much weaker assumptions. Often, endpoints are grouped together into physiological domains or by the body function they describe. We derive tests based on these subgroups, which might supplement or replace the individual endpoint analysis because they are more powerful. The permutation or bootstrap distribution is used throughout to obtain global, subgroup, and individual significance levels as they naturally incorporate the correlation among endpoints. We provide a theorem that establishes a connection between marginal homogeneity and the stronger exchangeability assumption under the permutation approach. Multiplicity adjustments for the individual endpoints are obtained via stepdown procedures, while subgroup significance levels are adjusted via the full closed testing procedure. The proposed methodology is illustrated using a collection of 25 correlated ordinal endpoints, grouped into six domains, to evaluate toxicity of a chemical compound.  相似文献   

17.
Gilbert PB  Wu C  Jobes DV 《Biometrics》2008,64(1):198-207
Summary .   Consider a placebo-controlled preventive HIV vaccine efficacy trial. An HIV amino acid sequence is measured from each volunteer who acquires HIV, and these sequences are aligned together with the reference HIV sequence represented in the vaccine. We develop genome scanning methods to identify positions at which the amino acids in infected vaccine recipient sequences either (A) are more divergent from the reference amino acid than the amino acids in infected placebo recipient sequences or (B) have a different frequency distribution than the placebo sequences, irrespective of a reference amino acid. We consider t -test-type statistics for problem A and Euclidean, Mahalanobis, and Kullback–Leibler-type statistics for problem B. The test statistics incorporate weights to reflect biological information contained in different amino acid positions and mismatches. Position-specific p -values are obtained by approximating the null distribution of the statistics either by a permutation procedure or by a nonparametric estimation. A permutation method is used to estimate a cut-off p -value to control the per comparison error rate at a prespecified level. The methods are examined in simulations and are applied to two HIV examples. The methods for problem B address the general problem of comparing discrete frequency distributions between groups in a high-dimensional data setting.  相似文献   

18.
A Gottschau 《Biometrics》1992,48(3):751-763
Time-homogeneous Markov chain models with state space [0, 1]k are useful in analysis of binary follow-up data on k individuals that interact. The number of parameters increases exponentially with k so more restrictive models are imperative for statistical inference. The hypothesis that the matrix of transition probabilities is invariant under permutation of individuals is discussed. It is shown that if individuals are exchangeable, then the process counting the number of individuals occupying a given state is a Markov chain. This reduction of data is sufficient if either at most a single individual may change state between two consecutive time points or if a state is absorbing. Similar results are obtained for exchangeability within two subgroups. Inference in the multivariate process reduces to a univariate problem if individuals are independent given the group's previous response. It is shown how conditional independence could be tested assuming exchangeability. The different hypotheses re examined in an analysis of the occurrence of bacteria in milk samples of Danish dairy cattle.  相似文献   

19.
We develop a permutation test for assessing a difference in the areas under the curve (AUCs) in a paired setting where both modalities are given to each diseased and nondiseased subject. We propose that permutations be made between subjects specifically by shuffling the diseased/nondiseased labels of the subjects within each modality. As these permutations are made within modality, the permutation test is valid even if both modalities are measured on different scales. We show that our permutation test is a sign test for the symmetry of an underlying discrete distribution whose size remains valid under the assumption of equal AUCs. We demonstrate the operating characteristics of our test via simulation and show that our test is equal in power to a permutation test recently proposed by Bandos and others (2005).  相似文献   

20.
The etiology of chronic Inflammatory Bowel Diseases (IBD) remains unknown, with both genetic and environmental risk factors having been implicated. A recent collaborative study of IBD provides clinical data from families with three or more affected first-degree relatives. The scientific question is whether specific clinical characteristics aggregate among affected individuals within families. Gastroenterological researchers have examined the number of concordant familial pairs in familial aggregation studies, but methods and results have been discrepant. This article investigates concepts of concordance and gives a comprehensive statistical treatment for testing concordance of various clinical traits in familial studies. For dichotomous traits, the distribution of this statistic under the null hypothesis of no familial aggregation is obtained by three methods: asymptotic, probability generating function, and permutation. The permutation method is extended to analyze aggregation for non-dichotomous traits and co-aggregations between two traits. We apply the permutation method to analyze the aforementioned multiply-affected IBD family data. Evidence is found for familial clustering of various traits, some of which are not revealed in existing studies. Such analyses provide a basis for investigating the dependence of trait aggregation upon genetic or environmental risk factors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号