首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Testing for differentially expressed genes with microarray data   总被引:1,自引:1,他引:0       下载免费PDF全文
This paper compares the type I error and power of the one- and two-sample t-tests, and the one- and two-sample permutation tests for detecting differences in gene expression between two microarray samples with replicates using Monte Carlo simulations. When data are generated from a normal distribution, type I errors and powers of the one-sample parametric t-test and one-sample permutation test are very close, as are the two-sample t-test and two-sample permutation test, provided that the number of replicates is adequate. When data are generated from a t-distribution, the permutation tests outperform the corresponding parametric tests if the number of replicates is at least five. For data from a two-color dye swap experiment, the one-sample test appears to perform better than the two-sample test since expression measurements for control and treatment samples from the same spot are correlated. For data from independent samples, such as the one-channel array or two-channel array experiment using reference design, the two-sample t-tests appear more powerful than the one-sample t-tests.  相似文献   

2.
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.  相似文献   

3.
Nummi T  Pan J  Siren T  Liu K 《Biometrics》2011,67(3):871-875
Summary In most research on smoothing splines the focus has been on estimation, while inference, especially hypothesis testing, has received less attention. By defining design matrices for fixed and random effects and the structure of the covariance matrices of random errors in an appropriate way, the cubic smoothing spline admits a mixed model formulation, which places this nonparametric smoother firmly in a parametric setting. Thus nonlinear curves can be included with random effects and random coefficients. The smoothing parameter is the ratio of the random‐coefficient and error variances and tests for linear regression reduce to tests for zero random‐coefficient variances. We propose an exact F‐test for the situation and investigate its performance in a real pine stem data set and by simulation experiments. Under certain conditions the suggested methods can also be applied when the data are dependent.  相似文献   

4.
We present two tests for seasonal trend in monthly incidence data. The first approach uses a penalized likelihood to choose the number of harmonic terms to include in a parametric harmonic model (which includes time trends and autogression as well as seasonal harmonic terms) and then tests for seasonality using a parametric bootstrap test. The second approach uses a semiparametric regression model to test for seasonal trend. In the semiparametric model, the seasonal pattern is modeled nonparametrically, parametric terms are included for autoregressive effects and a linear time trend, and a parametric bootstrap test is used to test for seasonality. For both procedures, a null distribution is generated under a null Poisson model with time trends and autoregression parameters.We apply the methods to skin melanoma incidence rates collected by the surveillance, epidemiology, and end results (SEER) program of the National Cancer Institute, and perform simulation studies to evaluate the type I error rate and power for the two procedures. These simulations suggest that both procedures are alpha-level procedures. In addition, the harmonic model/bootstrap test had similar or larger power than the semiparametric model/bootstrap test for a wide range of alternatives, and the harmonic model/bootstrap test is much easier to implement. Thus, we recommend the harmonic model/bootstrap test for the analysis of seasonal incidence data.  相似文献   

5.
6.
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth''s ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth''s parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth''s parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.  相似文献   

7.
A method for fitting piecewise exponential regression models to censored survival data is described. Stratification is performed recursively, using a combination of statistical tests and residual analysis. The splitting criterion employed in cross-validation is the average squared error of the residuals. The bootstrap is employed to keep the probability of a type I error (the error of discovering two or more strata when there is only one) of the method close to a predetermined value. The proposed method can thus also serve as a formal goodness-of-fit test for the exponential regression model. Real and simulated data are used for illustration.  相似文献   

8.
Behavioural studies are commonly plagued with data that violate the assumptions of parametric statistics. Consequently, classic nonparametric methods (e.g. rank tests) and novel distribution-free methods (e.g. randomization tests) have been used to a great extent by behaviourists. However, the robustness of such methods in terms of statistical power and type I error have seldom been evaluated. This probably reflects the fact that empirical methods, such as Monte Carlo approaches, are required to assess these concerns. In this study we show that analytical methods cannot always be used to evaluate the robustness of statistical tests, but rather Monte Carlo approaches must be employed. We detail empirical protocols for estimating power and type I error rates for parametric, nonparametric and randomization methods, and demonstrate their application for an analysis of variance and a regression/correlation analysis design. Together, this study provides a framework from which behaviourists can compare the reliability of different methods for data analysis, serving as a basis for selecting the most appropriate statistical test given the characteristics of data at hand. Copyright 2001 The Association for the Study of Animal Behaviour.  相似文献   

9.
Human blood group A, B, H, Ii, Lea and Leb antigens and their determinants expressed on ovarian cyst glycoproteins have been studied for over five decades. However, little is known about sialyl Lex and sialyl Lea glycotopes, which play essential roles in normal immunity, inflammation, and cancer cell metastasis. Furthermore, Lex and Ley were classified as glycotopes of unknown genes. Identification of these Lewis epitopes was hampered by the lack of specific antibodies. In this study, the occurrence of sialyl Lex, sialyl Lea, Lex and Ley reactivities in cyst glycoproteins was characterized by enzyme-linked immunosorbent assays. The results indicated that most human ovarian cyst glycoproteins carried Lex (8/25) and/or Ley (17/25) glycotopes. The expression (epitopes) of the new genes described in previous reports are Lex and Ley glycotopes; the reactivities of sialyl Lex and sialyl Lea glycotopes in secreted cyst glycoproteins may be affected by the conditions of purification; the relationship between Ley and human blood group ABH was confirmed; recognition profiles of sialyl Lex, sialyl Lea, Lex and Ley present in the carbohydrate chains of water-soluble cyst glycoproteins were illustrated; possible attachments of glycotopes to the internal carbohydrate complex of cyst glycoproteins have been reconstructed; proposed biosynthetic pathways for the formation of sialyl Lea, sialyl Lex, Lex, Ley, ALey and BLey determinant structures on Type I and Type II core structures of human ovarian cyst glycoproteins are also included in this study.  相似文献   

10.
Riyan Cheng  Abraham A. Palmer 《Genetics》2013,193(3):1015-1018
We used simulations to evaluate methods for assessing statistical significance in association studies. When the statistical model appropriately accounted for relatedness among individuals, unrestricted permutation tests and a few other simulation-based methods effectively controlled type I error rates; otherwise, only gene dropping controlled type I error but at the expense of statistical power.  相似文献   

11.
Keith P. Lewis 《Oikos》2004,104(2):305-315
Ecologists rely heavily upon statistics to make inferences concerning ecological phenomena and to make management recommendations. It is therefore important to use statistical tests that are most appropriate for a given data-set. However, inappropriate statistical tests are often used in the analysis of studies with categorical data (i.e. count data or binary data). Since many types of statistical tests have been used in artificial nests studies, a review and comparison of these tests provides an opportunity to demonstrate the importance of choosing the most appropriate statistical approach for conceptual reasons as well as type I and type II errors.
Artificial nests have routinely been used to study the influences of habitat fragmentation, and habitat edges on nest predation. I review the variety of statistical tests used to analyze artificial nest data within the framework of the generalized linear model and argue that logistic regression is the most appropriate and flexible statistical test for analyzing binary data-sets. Using artificial nest data from my own studies and an independent data set from the medical literature as examples, I tested equivalent data using a variety of statistical methods. I then compared the p-values and the statistical power of these tests. Results vary greatly among statistical methods. Methods inappropriate for analyzing binary data often fail to yield significant results even when differences between study groups appear large, while logistic regression finds these differences statistically significant. Statistical power is is 2–3 times higher for logistic regression than for other tests. I recommend that logistic regression be used to analyze artificial nest data and other data-sets with binary data.  相似文献   

12.
13.
Permutation test is a popular technique for testing a hypothesis of no effect, when the distribution of the test statistic is unknown. To test the equality of two means, a permutation test might use a test statistic which is the difference of the two sample means in the univariate case. In the multivariate case, it might use a test statistic which is the maximum of the univariate test statistics. A permutation test then estimates the null distribution of the test statistic by permuting the observations between the two samples. We will show that, for such tests, if the two distributions are not identical (as for example when they have unequal variances, correlations or skewness), then a permutation test for equality of means based on difference of sample means can have an inflated Type I error rate even when the means are equal. Our results illustrate permutation testing should be confined to testing for non-identical distributions. CONTACT: calian@raunvis.hi.is.  相似文献   

14.
The initial presentation of multifactor dimensionality reduction (MDR) featured cross-validation to mitigate over-fitting, computationally efficient searches of the epistatic model space, and variable construction with constructive induction to alleviate the curse of dimensionality. However, the method was unable to differentiate association signals arising from true interactions from those due to independent main effects at individual loci. This issue leads to problems in inference and interpretability for the results from MDR and the family-based compliment the MDR-pedigree disequilibrium test (PDT). A suggestion from previous work was to fit regression models post hoc to specifically evaluate the null hypothesis of no interaction for MDR or MDR-PDT models. We demonstrate with simulation that fitting a regression model on the same data as that analyzed by MDR or MDR-PDT is not a valid test of interaction. This is likely to be true for any other procedure that searches for models, and then performs an uncorrected test for interaction. We also show with simulation that when strong main effects are present and the null hypothesis of no interaction is true, that MDR and MDR-PDT reject at far greater than the nominal rate. We also provide a valid regression-based permutation test procedure that specifically tests the null hypothesis of no interaction, and does not reject the null when only main effects are present. The regression-based permutation test implemented here conducts a valid test of interaction after a search for multilocus models, and can be applied to any method that conducts a search to find a multilocus model representing an interaction.  相似文献   

15.
The multifactor dimensionality reduction (MDR) is a model-free approach that can identify gene x gene or gene x environment effects in a case-control study. Here we explore several modifications of the MDR method. We extended MDR to provide model selection without crossvalidation, and use a chi-square statistic as an alternative to prediction error (PE). We also modified the permutation test to provide different levels of stringency. The extended MDR (EMDR) includes three permutation tests (fixed, non-fixed, and omnibus) to obtain p-values of multilocus models. The goal of this study was to compare the different approaches implemented in the EMDR method and evaluate the ability to identify genetic effects in the Genetic Analysis Workshop 14 simulated data. We used three replicates from the simulated family data, generating matched pairs from family triads. The results showed: 1) chi-square and PE statistics give nearly consistent results; 2) results of EMDR without cross-validation matched that of EMDR with 10-fold cross-validation; 3) the fixed permutation test reports false-positive results in data from loci unrelated to the disease, but the non-fixed and omnibus permutation tests perform well in preventing false positives, with the omnibus test being the most conservative. We conclude that the non-cross-validation test can provide accurate results with the advantage of high efficiency compared to 10-cross-validation, and the non-fixed permutation test provides a good compromise between power and false-positive rate.  相似文献   

16.
The mechanism of antimalarial action of [Au(CQ)(PPh3)]PF6 (1), which is active in vitro against CQ-resistant P. falciparum and in vivo against P. berghei, has been investigated in relation to hemozoin formation and DNA as possible important targets. Complex 1 interacts with heme and inhibits β-hematin formation both in aqueous medium and near water/n-octanol interfaces at pH ~ 5 to a greater extent than chloroquine diphosphate (CQDP) or other known metal-based antimalarial agents; the higher inhibition activity is probably related to the higher lipophilicity observed for 1 through partition coefficient measurements at low pH, with respect to CQDP. The interactions of complex 1 with DNA were explored using spectrophotometric and fluorimetric titrations, circular dichroism spectroscopy, viscosity and melting point studies, as well as electrophoresis and covalent binding assays. The experimental data indicate that complex 1 interacts with DNA predominantly by intercalation and electrostatic association of the CQ moiety, similarly to free CQDP, while no covalent metal-DNA binding seems to take place. The most likely antimalarial mechanism for complex 1 is thus heme aggregation inhibition; the high activities observed against resistant parasites are probably due to the structural modification of CQ introduced by the presence of the gold-triphenylphosphine fragment, together with the enhanced lipophilic character.  相似文献   

17.
Multilocus DNA fingerprinting is commonly used to assess genetic similarity within and between geographically disjunct populations. Typically, the proportion of DNA fingerprinting bands shared between two individuals ( S XY) is calculated for all possible pairwise comparisons and the resulting data analyzed parametrically to test differences in mean band-sharing among groups. The degree to which covariation among interdependent S XY values ( S ab - S bc) biases the analyses is often unknown. Here, we assess the extent of covariation in four DNA fingerprinting studies and evaluate the effectiveness of two corrective procedures, a permutation test and a subsampling routine using only independent pairwise comparisons drawn without replacement from the overall data. Covariation among interdependent S XY values was significantly greater than zero in every data set examined, including those from a bee, a rodent, and two passerine birds. Permutation tests did not correct for interdependence and yielded significance values nearly identical to those derived from uncorrected parametric procedures. In contrast, the subsampling procedure yielded corrected estimates of the standard error that were two to four times larger than those derived parametrically. As a result, comparisons that were significant using parametric tests were either non-significant or only marginally significant with the subsampling routine. We conclude that interdependence among S XY values poses a substantial obstacle to hypothesis testing that must be addressed in future studies.  相似文献   

18.
Studies of evolutionary correlations commonly use phylogenetic regression (i.e., independent contrasts and phylogenetic generalized least squares) to assess trait covariation in a phylogenetic context. However, while this approach is appropriate for evaluating trends in one or a few traits, it is incapable of assessing patterns in highly multivariate data, as the large number of variables relative to sample size prohibits parametric test statistics from being computed. This poses serious limitations for comparative biologists, who must either simplify how they quantify phenotypic traits, or alter the biological hypotheses they wish to examine. In this article, I propose a new statistical procedure for performing ANOVA and regression models in a phylogenetic context that can accommodate high‐dimensional datasets. The approach is derived from the statistical equivalency between parametric methods using covariance matrices and methods based on distance matrices. Using simulations under Brownian motion, I show that the method displays appropriate Type I error rates and statistical power, whereas standard parametric procedures have decreasing power as data dimensionality increases. As such, the new procedure provides a useful means of assessing trait covariation across a set of taxa related by a phylogeny, enabling macroevolutionary biologists to test hypotheses of adaptation, and phenotypic change in high‐dimensional datasets.  相似文献   

19.
The Cochran-Armitage test has commonly been used for a trend test in binomial proportions. The quasi-likelihood method provides a simple approach to model extra-binomial proportions. Two versions of the score and Wald tests using different parameterizations for the extra-binomial variance were investigated: one in terms of intercluster correlation, and another in terms of variance. The Monte Carlo simulation was used to evaluate the performance of the each version of the score test and the Wald test, and the Cochran-Armitage test. The simulation shows that the Cochran-Armitage test has the proper size only for the binomial sample data, and the test is no longer valid when applied to the extra-binomial data. The Wald test is more likely to exceed the nominal level than the score test under either intercluster correlation model or variance model. Both score tests performed very well even with the binomial data; the tests control the type I error and in the meantime maintain the power of detecting the dose effects. Based on the design considered in this paper, the two scores test are comparable. The score test based on the intercluster correlations model seems better controlling the Type I error but appears less powerful than that based on the variance model. An example from a developmental toxicity experiment is given.  相似文献   

20.
Jonckheere's test is a frequently used nonparametric trend test for the evaluation of preclinical studies and clinical dose-finding trials. In this paper, a modification of Jonckheere's test is proposed. If the exact permutation distribution is used for inference, the modified test can fill out the level of the type I error in a much more complete way and is substantially more powerful than the common Jonckheere test. If the asymptotic normality is used for inference, the modified test is slightly more powerful. In addition, a maximum test is investigated which is more robust concerning an a priori unknown dose-response shape. The robustness is advantageous, especially in a closed testing procedure. The different tests are applied to two example data sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号