首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Genomewide association (GWA) studies assay hundreds of thousands of single nucleotide polymorphisms (SNPs) simultaneously across the entire genome and associate them with diseases, other biological or clinical traits. The association analysis usually tests each SNP as an independent entity and ignores the biological information such as linkage disequilibrium. Although the Bonferroni correction and other approaches have been proposed to address the issue of multiple comparisons as a result of testing many SNPs, there is a lack of understanding of the distribution of an association test statistic when an entire genome is considered together. In other words, there are extensive efforts in hypothesis testing, and almost no attempt in estimating the density under the null hypothesis. By estimating the true null distribution, we can apply the result directly to hypothesis testing; better assess the existing approaches of multiple comparisons; and evaluate the impact of linkage disequilibrium on the GWA studies. To this end, we estimate the empirical null distribution of an association test statistic in GWA studies using simulated population data. We further propose a convenient and accurate method based on adaptive spline to estimate the empirical value in GWA studies and validate our findings using a real data set. Our method enables us to fully characterize the null distribution of an association test that not only can be used to test the null hypothesis of no association, but also provides important information about the impact of density of the genetic markers on the significance of the tests. Our method does not require users to perform computationally intensive permutations, and hence provides a timely solution to an important and difficult problem in GWA studies.  相似文献   

2.
ABSTRACT The controversy over the use of null hypothesis statistical testing (NHST) has persisted for decades, yet NHST remains the most widely used statistical approach in wildlife sciences and ecology. A disconnect exists between those opposing NHST and many wildlife scientists and ecologists who conduct and publish research. This disconnect causes confusion and frustration on the part of students. We, as students, offer our perspective on how this issue may be addressed. Our objective is to encourage academic institutions and advisors of undergraduate and graduate students to introduce students to various statistical approaches so we can make well-informed decisions on the appropriate use of statistical tools in wildlife and ecological research projects. We propose an academic course that introduces students to various statistical approaches (e.g., Bayesian, frequentist, Fisherian, information theory) to build a foundation for critical thinking in applying statistics. We encourage academic advisors to become familiar with the statistical approaches available to wildlife scientists and ecologists and thus decrease bias towards one approach. Null hypothesis statistical testing is likely to persist as the most common statistical analysis tool in wildlife science until academic institutions and student advisors change their approach and emphasize a wider range of statistical methods.  相似文献   

3.
生态学假说试验验证的原假说困境   总被引:1,自引:1,他引:0  
李际 《生态学杂志》2016,27(6):2031-2038
试验方法是生态学假说的主要验证方法之一,但也存在由原假说引发的质疑.Quinn和Dunham(1983)通过对Platt(1964)的假说-演绎模型进行分析,主张生态学不可能存在可以严格被试验验证的原假说.Fisher的证伪主义与Neyman-Pearson(N-P)的非判决性使得统计学原假说不能被严格验证;而生态过程中存在的不同于经典物理学的原假说H0(α=1,β=0)与不同的备假说H1′(α′=1,β′=0)的情况,使得生态学原假说也很难得到严格的实验验证.通过降低P值、谨慎选择原假说、对非原假说采取非中心化和双侧验证可分别缓解上述的原假说困境.但统计学的原假说显著性验证(NHST)不应等同于生态学假说中有关因果关系的逻辑证明方法.因此,现有大量基于NHST的生态学假说的方法研究和试验验证的结果与结论都不是绝对的逻辑可靠的.  相似文献   

4.
NOETHER (1987) proposed a method of sample size determination for the Wilcoxon-Mann-Whitney test. To obtain a sample size formula, he restricted himself to alternatives that differ only slightly from the null hypothesis, so that the unknown variance o2 of the Mann-Whitney statistic can be approximated by the known variance under the null hypothesis which depends only on n. This fact is frequently forgotten in statistical practice. In this paper, we compare Noether's large sample solution against an alternative approach based on upper bounds of σ2 which is valid for any alternatives. This comparison shows that Noether's approximation is sufficiently reliable with small and large deviations from the null hypothesis.  相似文献   

5.
Epigenetic research leads to complex data structures. Since parametric model assumptions for the distribution of epigenetic data are hard to verify we introduce in the present work a nonparametric statistical framework for two-group comparisons. Furthermore, epigenetic analyses are often performed at various genetic loci simultaneously. Hence, in order to be able to draw valid conclusions for specific loci, an appropriate multiple testing correction is necessary. Finally, with technologies available for the simultaneous assessment of many interrelated biological parameters (such as gene arrays), statistical approaches also need to deal with a possibly unknown dependency structure in the data. Our statistical approach to the nonparametric comparison of two samples with independent multivariate observables is based on recently developed multivariate multiple permutation tests. We adapt their theory in order to cope with families of hypotheses regarding relative effects. Our results indicate that the multivariate multiple permutation test keeps the pre-assigned type I error level for the global null hypothesis. In combination with the closure principle, the family-wise error rate for the simultaneous test of the corresponding locus/parameter-specific null hypotheses can be controlled. In applications we demonstrate that group differences in epigenetic data can be detected reliably with our methodology.  相似文献   

6.
Background: Statistical validation of predicted complexes is a fundamental issue in proteomics and bioinformatics. The target is to measure the statistical significance of each predicted complex in terms of p-values. Surprisingly, this issue has not received much attention in the literature. To our knowledge, only a few research efforts have been made towards this direction. Methods: In this article, we propose a novel method for calculating the p-value of a predicted complex. The null hypothesis is that there is no difference between the number of edges in target protein complex and that in the random null model. In addition, we assume that a true protein complex must be a connected subgraph. Based on this null hypothesis, we present an algorithm to compute the p-value of a given predicted complex. Results: We test our method on five benchmark data sets to evaluate its effectiveness. Conclusions: The experimental results show that our method is superior to the state-of-the-art algorithms on assessing the statistical significance of candidate protein complexes.  相似文献   

7.
When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.  相似文献   

8.
Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptide-spectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discovery rate (FDR) analysis to correct for multiple testing. We first describe a simple FDR inference method and then describe how estimating and taking into account the percentage of incorrectly identified spectra in the entire data set can lead to increased statistical power.  相似文献   

9.
Analysing social networks is challenging. Key features of relational data require the use of non-standard statistical methods such as developing system-specific null, or reference, models that randomize one or more components of the observed data. Here we review a variety of randomization procedures that generate reference models for social network analysis. Reference models provide an expectation for hypothesis testing when analysing network data. We outline the key stages in producing an effective reference model and detail four approaches for generating reference distributions: permutation, resampling, sampling from a distribution, and generative models. We highlight when each type of approach would be appropriate and note potential pitfalls for researchers to avoid. Throughout, we illustrate our points with examples from a simulated social system. Our aim is to provide social network researchers with a deeper understanding of analytical approaches to enhance their confidence when tailoring reference models to specific research questions.  相似文献   

10.
When testing for genetic differentiation the joint null hypothesis that there is no allele frequency difference at any locus is of interest. Common approaches to test this hypothesis are based on the summation of χ2 statistics over loci and on the Bonferroni correction, respectively. Here, we also consider the Simes adjustment and a recently proposed truncated product method (TPM) to combine P‐values. The summation and the TPM (using a relatively large truncation point) are powerful when there are differences in many or all loci. The Simes adjustment, however, is powerful when there are differences regarding one or a few loci only. As a compromise between the different approaches we introduce a combination between the Simes adjustment and the TPM, i.e. the joint null hypothesis is rejected if at least one of the two methods, Simes and TPM, is significant at the α/2‐level. Simulation results indicate that this combination is a robust procedure with high power over the different types of alternatives.  相似文献   

11.
There is increasing evidence that pleiotropy, the association of multiple traits with the same genetic variants/loci, is a very common phenomenon. Cross-phenotype association tests are often used to jointly analyze multiple traits from a genome-wide association study (GWAS). The underlying methods, however, are often designed to test the global null hypothesis that there is no association of a genetic variant with any of the traits, the rejection of which does not implicate pleiotropy. In this article, we propose a new statistical approach, PLACO, for specifically detecting pleiotropic loci between two traits by considering an underlying composite null hypothesis that a variant is associated with none or only one of the traits. We propose testing the null hypothesis based on the product of the Z-statistics of the genetic variants across two studies and derive a null distribution of the test statistic in the form of a mixture distribution that allows for fractions of variants to be associated with none or only one of the traits. We borrow approaches from the statistical literature on mediation analysis that allow asymptotic approximation of the null distribution avoiding estimation of nuisance parameters related to mixture proportions and variance components. Simulation studies demonstrate that the proposed method can maintain type I error and can achieve major power gain over alternative simpler methods that are typically used for testing pleiotropy. PLACO allows correlation in summary statistics between studies that may arise due to sharing of controls between disease traits. Application of PLACO to publicly available summary data from two large case-control GWAS of Type 2 Diabetes and of Prostate Cancer implicated a number of novel shared genetic regions: 3q23 (ZBTB38), 6q25.3 (RGS17), 9p22.1 (HAUS6), 9p13.3 (UBAP2), 11p11.2 (RAPSN), 14q12 (AKAP6), 15q15 (KNL1) and 18q23 (ZNF236).  相似文献   

12.
We evaluate a common reasoning strategy used in community ecology and comparative psychology for selecting between competing hypotheses. This strategy labels one hypothesis as a “null” on the grounds of its simplicity and epistemically privileges it as accepted until rejected. We argue that this strategy is unjustified. The asymmetrical treatment of statistical null hypotheses is justified through the experimental and mathematical contexts in which they are used, but these contexts are missing in the case of the “pseudo-null hypotheses” found in our case studies. Moreover, statistical nulls are often not epistemically privileged in practice over their alternatives because failing to reject the null is usually a negative result about the alternative, experimental hypothesis. Scientists should eschew the appeal to pseudo-nulls. It is a rhetorical strategy that glosses over a commitment to valuing simplicity over other epistemic virtues in the name of good scientific and statistical methodology.  相似文献   

13.
Smooth tests for the zero-inflated poisson distribution   总被引:1,自引:0,他引:1  
Thas O  Rayner JC 《Biometrics》2005,61(3):808-815
In this article we construct three smooth goodness-of-fit tests for testing for the zero-inflated Poisson (ZIP) distribution against general smooth alternatives in the sense of Neyman. We apply our tests to a data set previously claimed to be ZIP distributed, and show that the ZIP is not a good model to describe the data. At rejection of the null hypothesis of ZIP, the individual components of the test statistic, which are directly related to interpretable parameters in a smooth model, may be used to gain insight into an alternative distribution.  相似文献   

14.
Although a large body of work investigating tests of correlated evolution of two continuous characters exists, hypotheses such as character displacement are really tests of whether substantial evolutionary change has occurred on a particular branch or branches of the phylogenetic tree. In this study, we present a methodology for testing such a hypothesis using ancestral character state reconstruction and simulation. Furthermore, we suggest how to investigate the robustness of the hypothesis test by varying the reconstruction methods or simulation parameters. As a case study, we tested a hypothesis of character displacement in body size of Caribbean Anolis lizards. We compared squared-change, weighted squared-change, and linear parsimony reconstruction methods, gradual Brownian motion and speciational models of evolution, and several resolution methods for linear parsimony. We used ancestor reconstruction methods to infer the amount of body size evolution, and tested whether evolutionary change in body size was greater on branches of the phylogenetic tree in which a transition from occupying a single-species island to a two-species island occurred. Simulations were used to generate null distributions of reconstructed body size change. The hypothesis of character displacement was tested using Wilcoxon Rank-Sums. When tested against simulated null distributions, all of the reconstruction methods resulted in more significant P-values than when standard statistical tables were used. These results confirm that P-values for tests using ancestor reconstruction methods should be assessed via simulation rather than from standard statistical tables. Linear parsimony can produce an infinite number of most parsimonious reconstructions in continuous characters. We present an example of assessing the robustness of our statistical test by exploring the sample space of possible resolutions. We compare ACCTRAN and DELTRAN resolutions of ambiguous character reconstructions in linear parsimony to the most and least conservative resolutions for our particular hypothesis.  相似文献   

15.
Null hypothesis significance testing has been under attack in recent years, partly owing to the arbitrary nature of setting α (the decision-making threshold and probability of Type I error) at a constant value, usually 0.05. If the goal of null hypothesis testing is to present conclusions in which we have the highest possible confidence, then the only logical decision-making threshold is the value that minimizes the probability (or occasionally, cost) of making errors. Setting α to minimize the combination of Type I and Type II error at a critical effect size can easily be accomplished for traditional statistical tests by calculating the α associated with the minimum average of α and β at the critical effect size. This technique also has the flexibility to incorporate prior probabilities of null and alternate hypotheses and/or relative costs of Type I and Type II errors, if known. Using an optimal α results in stronger scientific inferences because it estimates and minimizes both Type I errors and relevant Type II errors for a test. It also results in greater transparency concerning assumptions about relevant effect size(s) and the relative costs of Type I and II errors. By contrast, the use of α = 0.05 results in arbitrary decisions about what effect sizes will likely be considered significant, if real, and results in arbitrary amounts of Type II error for meaningful potential effect sizes. We cannot identify a rationale for continuing to arbitrarily use α = 0.05 for null hypothesis significance tests in any field, when it is possible to determine an optimal α.  相似文献   

16.
Testing macro-evolutionary models using incomplete molecular phylogenies.   总被引:12,自引:0,他引:12  
Phylogenies reconstructed from gene sequences can be used to investigate the tempo and mode of species diversification. Here we develop and use new statistical methods to infer past patterns of speciation and extinction from molecular phylogenies. Specifically, we test the null hypothesis that per-lineage speciation and extinction rates have remained constant through time. Rejection of this hypothesis may provide evidence for evolutionary events such as adaptive radiations or key adaptations. In contrast to previous approaches, our methods are robust to incomplete taxon sampling and are conservative with respect to extinction. Using simulation we investigate, first, the adverse effects of failing to take incomplete sampling into account and, second, the power and reliability of our tests. When applied to published phylogenies our tests suggest that, in some cases, speciation rates have decreased through time.  相似文献   

17.
OBJECTIVE: To present an alternative linkage test to the transmission/disequilibrium test (TDT) which is conservative under the null hypothesis and generally more powerful under alternatives. METHODS: The exact distribution of the TDT is examined under both the null hypothesis and relevant alternatives. The TDT is rewritten in an alternate form based on the contributions from each of the three relevant parental mating types. This makes it possible to show that a particular term in the estimate is an exact tie and thus to rewrite the estimate without this term and to replace the multinomial 'variance estimate' of Spielman et al. [Am J Hum Genet 1993;52:506-516] by the binomial variance. RESULTS: The resulting test is shown to be a stratified McNemar test (SMN). The significance level attained by the SMN is shown to be conservative when compared to the asymptotic chi(2) distribution, while the TDT often exceeds the nominal level alpha. Under alternatives, the proposed test is shown to be typically more powerful than the TDT. CONCLUSION: The properties of the TDT as a statistical test have never been fully investigated. The proposed test replaces the heuristically motivated TDT by a formally derived test, which is also computationally simple.  相似文献   

18.
The directed transfer function (DTF) has been proposed as a measure of information flow between the components of multivariate time series. In this paper, we discuss the interpretation of the DTF and compare it with other measures for directed relationships. In particular, we show that the DTF does not indicate multivariate or bivariate Granger causality, but that it is closely related to the concept of impulse response function and can be viewed as a spectral measure for the total causal influence from one component to another. Furthermore, we investigate the statistical properties of the DTF and establish a simple significance level for testing for the null hypothesis of no information flow.  相似文献   

19.

Background

The role of migratory birds and of poultry trade in the dispersal of highly pathogenic H5N1 is still the topic of intense and controversial debate. In a recent contribution to this journal, Flint argues that the strict application of the scientific method can help to resolve this issue.

Discussion

We argue that Flint's identification of the scientific method with null hypothesis testing is misleading and counterproductive. There is far more to science than the testing of hypotheses; not only the justification, bur also the discovery of hypotheses belong to science. We also show why null hypothesis testing is weak and that Bayesian methods are a preferable approach to statistical inference. Furthermore, we criticize the analogy put forward by Flint between involuntary transport of poultry and long-distance migration.

Summary

To expect ultimate answers and unequivocal policy guidance from null hypothesis testing puts unrealistic expectations on a flawed approach to statistical inference and on science in general.  相似文献   

20.
Microarray technology is rapidly emerging for genome-wide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the two-sample t-test or Wilcoxon test, are frequently used for evaluating statistical significance of informative expressions but require adjustment for large-scale multiplicity. Due to its simplicity, Bonferroni adjustment has been widely used to circumvent this problem. It is well known, however, that the standard Bonferroni test is often very conservative. In the present paper, we compare three multiple testing procedures in the microarray context: the original Bonferroni method, a Bonferroni-type improved single-step method and a step-down method. The latter two methods are based on nonparametric resampling, by which the null distribution can be derived with the dependency structure among gene expressions preserved and the family-wise error rate accurately controlled at the desired level. We also present a sample size calculation method for designing microarray studies. Through simulations and data analyses, we find that the proposed methods for testing and sample size calculation are computationally fast and control error and power precisely.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号