首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 37 毫秒
1.
MOTIVATION: Recently a class of nonparametric statistical methods, including the empirical Bayes (EB) method, the significance analysis of microarray (SAM) method and the mixture model method (MMM), have been proposed to detect differential gene expression for replicated microarray experiments conducted under two conditions. All the methods depend on constructing a test statistic Z and a so-called null statistic z. The null statistic z is used to provide some reference distribution for Z such that statistical inference can be accomplished. A common way of constructing z is to apply Z to randomly permuted data. Here we point our that the distribution of z may not approximate the null distribution of Z well, leading to possibly too conservative inference. This observation may apply to other permutation-based nonparametric methods. We propose a new method of constructing a null statistic that aims to estimate the null distribution of a test statistic directly. RESULTS: Using simulated data and real data, we assess and compare the performance of the existing method and our new method when applied in EB, SAM and MMM. Some interesting findings on operating characteristics of EB, SAM and MMM are also reported. Finally, by combining the idea of SAM and MMM, we outline a simple nonparametric method based on the direct use of a test statistic and a null statistic.  相似文献   

2.
MOTIVATION: An important goal in analyzing microarray data is to determine which genes are differentially expressed across two kinds of tissue samples or samples obtained under two experimental conditions. Various parametric tests, such as the two-sample t-test, have been used, but their possibly too strong parametric assumptions or large sample justifications may not hold in practice. As alternatives, a class of three nonparametric statistical methods, including the empirical Bayes method of Efron et al. (2001), the significance analysis of microarray (SAM) method of Tusher et al. (2001) and the mixture model method (MMM) of Pan et al. (2001), have been proposed. All the three methods depend on constructing a test statistic and a so-called null statistic such that the null statistic's distribution can be used to approximate the null distribution of the test statistic. However, relatively little effort has been directed toward assessment of the performance or the underlying assumptions of the methods in constructing such test and null statistics. RESULTS: We point out a problem of a current method to construct the test and null statistics, which may lead to largely inflated Type I errors (i.e. false positives). We also propose two modifications that overcome the problem. In the context of MMM, the improved performance of the modified methods is demonstrated using simulated data. In addition, our numerical results also provide evidence to support the utility and effectiveness of MMM.  相似文献   

3.
A Nonparametric Approach for Mapping Quantitative Trait Loci   总被引:23,自引:3,他引:20       下载免费PDF全文
L. Kruglyak  E. S. Lander 《Genetics》1995,139(3):1421-1428
Genetic mapping of quantitative trait loci (QTLs) is performed typically by using a parametric approach, based on the assumption that the phenotype follows a normal distribution. Many traits of interest, however, are not normally distributed. In this paper, we present a nonparametric approach to QTL mapping applicable to any phenotypic distribution. The method is based on a statistic Z(w), which generalizes the nonparametric Wilcoxon rank-sum test to the situation of whole-genome search by interval mapping. We determine the appropriate significance level for the statistic Z(w), by showing that its asymptotic null distribution follows an Ornstein-Uhlenbeck process. These results provide a robust, distribution-free method for mapping QTLs.  相似文献   

4.
In this paper we find a class of umbrella alternatives for which the MACK -WOLFE (1981) peak known test is optimal in the sense of Pitman efficiency. The asymptotic null distribution of the CHEN -WOLFE (1989) statistic for the peak-unknown umbrella alternatives problem is obtained. Some percentiles of the asymptotic distribution computed by simulation are presented.  相似文献   

5.
Tamhane AC  Logan BR 《Biometrics》2002,58(3):650-656
Tang, Gnecco, and Geller (1989, Biometrika 76, 577-583) proposed an approximate likelihood ratio (ALR) test of the null hypothesis that a normal mean vector equals a null vector against the alternative that all of its components are nonnegative with at least one strictly positive. This test is useful for comparing a treatment group with a control group on multiple endpoints, and the data from the two groups are assumed to follow multivariate normal distributions with different mean vectors and a common covariance matrix (the homoscedastic case). Tang et al. derived the test statistic and its null distribution assuming a known covariance matrix. In practice, when the covariance matrix is estimated, the critical constants tabulated by Tang et al. result in a highly liberal test. To deal with this problem, we derive an accurate small-sample approximation to the null distribution of the ALR test statistic by using the moment matching method. The proposed approximation is then extended to the heteroscedastic case. The accuracy of both the approximations is verified by simulations. A real data example is given to illustrate the use of the approximations.  相似文献   

6.
7.
J J Chen  R L Kodell 《Biometrics》1987,43(3):499-509
This paper proposes a method for analyzing tumor data from chronic studies when the experimental design includes combinations of two factors, for example, sex and dose. Both main effects and combined-effect (interaction) hypotheses are considered. A stratified log-rank statistic is presented for tests of no column or row (main) effects. The paper shows that when the numbers of animals in the cells are unequal and disproportional, the null distribution of the unstratified log-rank statistic does not have a chi-square distribution. Two simple models, additive and multiplicative, for representing the combined effect of row and column are considered under the proportional hazards model. A simple conservative statistic is proposed for testing the additivity of the row and column effects. A simulation experiment to examine the behavior of the null distribution of the combined-effect test statistic under the additive model and the power of the test against the multiplicative model is reported. The procedure is illustrated by analyzing mammary tumors induced by 7,12-dimethylbenz[a]anthracene (DMBA) in yellow and agouti F1 female mice from a laboratory experiment.  相似文献   

8.
The finite mixture model approach has attracted much attention in analyzing microarray data due to its robustness to the excessive variability which is common in the microarray data. Pan (2003) proposed to use the normal mixture model method (MMM) to estimate the distribution of a test statistic and its null distribution. However, considering the fact that the test statistic is often of t-type, our studies find that the rejection region from MMM is often significantly larger than the correct rejection region, resulting an inflated type I error. This motivates us to propose the t-mixture model (TMM) approach. In this paper, we demonstrate that TMM provides significantly more accurate control of the probability of making type I errors (hence of the familywise error rate) than MMM. Finally, TMM is applied to the well-known leukemia data of Golub et al. (1999). The results are compared with those obtained from MMM.  相似文献   

9.
OBJECTIVES: The association of a candidate gene with disease can be evaluated by a case-control study in which the genotype distribution is compared for diseased cases and unaffected controls. Usually, the data are analyzed with Armitage's test using the asymptotic null distribution of the test statistic. Since this test does not generally guarantee a type I error rate less than or equal to the significance level alpha, tests based on exact null distributions have been investigated. METHODS: An algorithm to generate the exact null distribution for both Armitage's test statistic and a recently proposed modification of the Baumgartner-Weiss-Schindler statistic is presented. I have compared the tests in a simulation study. RESULTS: The asymptotic Armitage test is slightly anticonservative whereas the exact tests control the type I error rate. The exact Armitage test is very conservative, but the exact test based on the modification of the Baumgartner-Weiss-Schindler statistic has a type I error rate close to alpha. The exact Armitage test is the least powerful test; the difference in power between the other two tests is often small and the comparison does not show a clear winner. CONCLUSION: Simulation results indicate that an exact test based on the modification of the Baumgartner-Weiss-Schindler statistic is preferable for the analysis of case-control studies of genetic markers.  相似文献   

10.
MOTIVATION: The parametric F-test has been widely used in the analysis of factorial microarray experiments to assess treatment effects. However, the normality assumption is often untenable for microarray experiments with small replications. Therefore, permutation-based methods are called for help to assess the statistical significance. The distribution of the F-statistics across all the genes on the array can be regarded as a mixture distribution with a proportion of statistics generated from the null distribution of no differential gene expression whereas the other proportion of statistics generated from the alternative distribution of genes differentially expressed. This results in the fact that the permutation distribution of the F-statistics may not approximate well to the true null distribution of the F-statistics. Therefore, the construction of a proper null statistic to better approximate the null distribution of F-statistic is of great importance to the permutation-based multiple testing in microarray data analysis. RESULTS: In this paper, we extend the ideas of constructing null statistics based on pairwise differences to neglect the treatment effects from the two-sample comparison problem to the multifactorial balanced or unbalanced microarray experiments. A null statistic based on a subpartition method is proposed and its distribution is employed to approximate the null distribution of the F-statistic. The proposed null statistic is able to accommodate unbalance in the design and is also corrected for the undue correlation between its numerator and denominator. In the simulation studies and real biological data analysis, the number of true positives and the false discovery rate (FDR) of the proposed null statistic are compared with those of the permutated version of the F-statistic. It has been shown that our proposed method has a better control of the FDRs and a higher power than the standard permutation method to detect differentially expressed genes because of the better approximated tail probabilities.  相似文献   

11.
In this paper the detection of rare variants association with continuous phenotypes of interest is investigated via the likelihood-ratio based variance component test under the framework of linear mixed models. The hypothesis testing is challenging and nonstandard, since under the null the variance component is located on the boundary of its parameter space. In this situation the usual asymptotic chisquare distribution of the likelihood ratio statistic does not necessarily hold. To circumvent the derivation of the null distribution we resort to the bootstrap method due to its generic applicability and being easy to implement. Both parametric and nonparametric bootstrap likelihood ratio tests are studied. Numerical studies are implemented to evaluate the performance of the proposed bootstrap likelihood ratio test and compare to some existing methods for the identification of rare variants. To reduce the computational time of the bootstrap likelihood ratio test we propose an effective approximation mixture for the bootstrap null distribution. The GAW17 data is used to illustrate the proposed test.  相似文献   

12.

Background  

Time-course microarray experiments are widely used to study the temporal profiles of gene expression. Storey et al. (2005) developed a method for analyzing time-course microarray studies that can be applied to discovering genes whose expression trajectories change over time within a single biological group, or those that follow different time trajectories among multiple groups. They estimated the expression trajectories of each gene using natural cubic splines under the null (no time-course) and alternative (time-course) hypotheses, and used a goodness of fit test statistic to quantify the discrepancy. The null distribution of the statistic was approximated through a bootstrap method. Gene expression levels in microarray data are often complicatedly correlated. An accurate type I error control adjusting for multiple testing requires the joint null distribution of test statistics for a large number of genes. For this purpose, permutation methods have been widely used because of computational ease and their intuitive interpretation.  相似文献   

13.
When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.  相似文献   

14.
Lui KJ  Kelly C 《Biometrics》2000,56(1):309-315
Lipsitz et al. (1998, Biometrics 54, 148-160) discussed testing the homogeneity of the risk difference for a series of 2 x 2 tables. They proposed and evaluated several weighted test statistics, including the commonly used weighted least squares test statistic. Here we suggest various important improvements on these test statistics. First, we propose using the one-sided analogues of the test procedures proposed by Lipsitz et al. because we should only reject the null hypothesis of homogeneity when the variation of the estimated risk differences between centers is large. Second, we generalize their study by redesigning the simulations to include the situations considered by Lipsitz et al. (1998) as special cases. Third, we consider a logarithmic transformation of the weighted least squares test statistic to improve the normal approximation of its sampling distribution. On the basis of Monte Carlo simulations, we note that, as long as the mean treatment group size per table is moderate or large (> or = 16), this simple test statistic, in conjunction with the commonly used adjustment procedure for sparse data, can be useful when the number of 2 x 2 tables is small or moderate (< or = 32). In these situations, in fact, we find that our proposed method generally outperforms all the statistics considered by Lipsitz et al. Finally, we include a general guideline about which test statistic should be used in a variety of situations.  相似文献   

15.
Genomewide association (GWA) studies assay hundreds of thousands of single nucleotide polymorphisms (SNPs) simultaneously across the entire genome and associate them with diseases, other biological or clinical traits. The association analysis usually tests each SNP as an independent entity and ignores the biological information such as linkage disequilibrium. Although the Bonferroni correction and other approaches have been proposed to address the issue of multiple comparisons as a result of testing many SNPs, there is a lack of understanding of the distribution of an association test statistic when an entire genome is considered together. In other words, there are extensive efforts in hypothesis testing, and almost no attempt in estimating the density under the null hypothesis. By estimating the true null distribution, we can apply the result directly to hypothesis testing; better assess the existing approaches of multiple comparisons; and evaluate the impact of linkage disequilibrium on the GWA studies. To this end, we estimate the empirical null distribution of an association test statistic in GWA studies using simulated population data. We further propose a convenient and accurate method based on adaptive spline to estimate the empirical value in GWA studies and validate our findings using a real data set. Our method enables us to fully characterize the null distribution of an association test that not only can be used to test the null hypothesis of no association, but also provides important information about the impact of density of the genetic markers on the significance of the tests. Our method does not require users to perform computationally intensive permutations, and hence provides a timely solution to an important and difficult problem in GWA studies.  相似文献   

16.
Zhao H  Tsiatis AA 《Biometrics》2001,57(3):861-867
We present a method for comparing the survival functions of quality-adjusted lifetime from two treatments. This test statistic becomes the ordinary log-rank test when quality-adjusted lifetime is the same as the survival time. Simulation experiments are conducted to examine the behavior of our proposed test statistic under both null and alternative hypotheses. In addition, we apply our method to a breast cancer trial for comparing the distribution of quality-adjusted lifetime between two treatment regimes.  相似文献   

17.
S. Datta  M. Kiparsky  D. M. Rand    J. Arnold 《Genetics》1996,144(4):1985-1992
In this paper we use cytonuclear disequilibria to test the neutrality of mtDNA markers. The data considered here involve sample frequencies of cytonuclear genotypes subject to both statistical sampling variation as well as genetic sampling variation. First, we obtain the dynamics of the sample cytonuclear disequilibria assuming random drift alone as the source of genetic sampling variation. Next, we develop a test statistic using cytonuclear disequilibria via the theory of generalized least squares to test the random drift model. The null distribution of the test statistic is shown to be approximately chi-squared using an asymptotic argument as well as computer simulation. Power of the test statistic is investigated under an alternative model with drift and selection. The method is illustrated using data from cage experiments utilizing different cytonuclear genotypes of Drosophila melanogaster. A program for implementing the neutrality test is available upon request.  相似文献   

18.
The problem of testing the separability of a covariance matrix against an unstructured variance‐covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first‐order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ2 distribution. The tests are implemented on a real dataset from medical studies.  相似文献   

19.
A class of nonparametric statistical methods, including a nonparametric empirical Bayes (EB) method, the Significance Analysis of Microarrays (SAM) and the mixture model method (MMM) have been proposed to detect differential gene expression for replicated microarray experiments. They all depend on constructing a test statistic, for example, a t-statistic, and then using permutation to draw inferences. However, due to special features of microarray data, using standard permutation scores may not estimate the null distribution of the test statistic well, leading to possibly too conservative inferences. We propose a new method of constructing weighted permutation scores to overcome the problem: posterior probabilities of having no differential expression from the EB method are used as weights for genes to better estimate the null distribution of the test statistic. We also propose a weighted method to estimate the false discovery rate (FDR) using the posterior probabilities. Using simulated data and real data for time-course microarray experiments, we show the improved performance of the proposed methods when implemented in MMM, EB and SAM.  相似文献   

20.
NOETHER (1987) proposed a method of sample size determination for the Wilcoxon-Mann-Whitney test. To obtain a sample size formula, he restricted himself to alternatives that differ only slightly from the null hypothesis, so that the unknown variance o2 of the Mann-Whitney statistic can be approximated by the known variance under the null hypothesis which depends only on n. This fact is frequently forgotten in statistical practice. In this paper, we compare Noether's large sample solution against an alternative approach based on upper bounds of σ2 which is valid for any alternatives. This comparison shows that Noether's approximation is sufficiently reliable with small and large deviations from the null hypothesis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号