首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Statistically nonsignificant (p > .05) results from a null hypothesis significance test (NHST) are often mistakenly interpreted as evidence that the null hypothesis is true—that there is “no effect” or “no difference.” However, many of these results occur because the study had low statistical power to detect an effect. Power below 50% is common, in which case a result of no statistical significance is more likely to be incorrect than correct. The inference of “no effect” is not valid even if power is high. NHST assumes that the null hypothesis is true; p is the probability of the data under the assumption that there is no effect. A statistical test cannot confirm what it assumes. These incorrect statistical inferences could be eliminated if decisions based on p values were replaced by a biological evaluation of effect sizes and their confidence intervals. For a single study, the observed effect size is the best estimate of the population effect size, regardless of the p value. Unlike p values, confidence intervals provide information about the precision of the observed effect. In the biomedical and pharmacology literature, methods have been developed to evaluate whether effects are “equivalent,” rather than zero, as tested with NHST. These methods could be used by biological anthropologists to evaluate the presence or absence of meaningful biological effects. Most of what appears to be known about no difference or no effect between sexes, between populations, between treatments, and other circumstances in the biological anthropology literature is based on invalid statistical inference.  相似文献   

2.
A key hypothesis in population ecology is that synchronous and intermittent seed production, known as mast seeding, is driven by the alternating allocation of carbohydrates and mineral nutrients between growth and reproduction in different years, i.e. ‘resource switching’. Such behaviour may ultimately generate bimodal distributions of long‐term flower and seed production, and evidence of these patterns has been taken to support the resource switching hypothesis. Here, we show how a widely‐used statistical test of bimodality applied by many studies in different ecological contexts may fail to reject the null hypothesis that focal probability distributions are unimodal. Using data from five tussock grass species in South Island, New Zealand, we find clear evidence of bimodality only when flowering patterns are analyzed with probabilistic mixture models. Mixture models provide a theory oriented framework for testing hypotheses of mast seeding patterns, enabling the different responses underlying medium‐ and high‐ versus non‐ and low‐flowering years to be modelled more realistically by associating these with distinct probability distributions. Coupling theoretical expectations with more rigorous statistical approaches will empower ecologists to reject null hypotheses more often.  相似文献   

3.
Widely used in testing statistical hypotheses, the Bonferroni multiple test has a rather low power that entails a high risk to accept falsely the overall null hypothesis and therefore to not detect really existing effects. We suggest that when the partial test statistics are statistically independent, it is possible to reduce this risk by using binomial modifications of the Bonferroni test. Instead of rejecting the null hypothesis when at least one of n partial null hypotheses is rejected at a very high level of significance (say, 0.005 in the case of n = 10), as it is prescribed by the Bonferroni test, the binomial tests recommend to reject the null hypothesis when at least k partial null hypotheses (say, k = [n/2]) are rejected at much lower level (up to 30-50%). We show that the power of such binomial tests is essentially higher as compared with the power of the original Bonferroni and some modified Bonferroni tests. In addition, such an approach allows us to combine tests for which the results are known only for a fixed significance level. The paper contains tables and a computer program which allow to determine (retrieve from a table or to compute) the necessary binomial test parameters, i.e. either the partial significance level (when k is fixed) or the value of k (when the partial significance level is fixed).  相似文献   

4.
Significance analysis of groups of genes in expression profiling studies   总被引:1,自引:0,他引:1  
MOTIVATION: Gene class testing (GCT) is a statistical approach to determine whether some functionally predefined classes of genes express differently under two experimental conditions. GCT computes the P-value of each gene class based on the null distribution and the gene classes are ranked for importance in accordance with their P-values. Currently, two null hypotheses have been considered: the Q1 hypothesis tests the relative strength of association with the phenotypes among the gene classes, and the Q2 hypothesis assesses the statistical significance. These two hypotheses are related but not equivalent. METHOD: We investigate three one-sided and two two-sided test statistics under Q1 and Q2. The null distributions of gene classes under Q1 are generated by permuting gene labels and the null distributions under Q2 are generated by permuting samples. RESULTS: We applied the five statistics to a diabetes dataset with 143 gene classes and to a breast cancer dataset with 508 GO (Gene Ontology) terms. In each statistic, the null distributions of the gene classes under Q1 are different from those under Q2 in both datasets, and their rankings can be different too. We clarify the one-sided and two-sided hypotheses, and discuss some issues regarding the Q1 and Q2 hypotheses for gene class ranking in the GCT. Because Q1 does not deal with correlations among genes, we prefer test based on Q2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

5.
Most ecologists and evolutionary biologists continue to rely heavily on null hypothesis significance testing, rather than on recently advocated alternatives, for inference. Here, we briefly review null hypothesis significance testing and its major alternatives. We identify major objectives of statistical analysis and suggest which analytical approaches are appropriate for each. Any well designed study can improve our understanding of biological systems, regardless of the inferential approach used. Nevertheless, an awareness of available techniques and their pitfalls could guide better approaches to data collection and broaden the range of questions that can be addressed. Although we should reduce our reliance on significance testing, it retains an important role in statistical education and is likely to remain fundamental to the falsification of scientific hypotheses.  相似文献   

6.
7.

Background

The role of migratory birds and of poultry trade in the dispersal of highly pathogenic H5N1 is still the topic of intense and controversial debate. In a recent contribution to this journal, Flint argues that the strict application of the scientific method can help to resolve this issue.

Discussion

We argue that Flint's identification of the scientific method with null hypothesis testing is misleading and counterproductive. There is far more to science than the testing of hypotheses; not only the justification, bur also the discovery of hypotheses belong to science. We also show why null hypothesis testing is weak and that Bayesian methods are a preferable approach to statistical inference. Furthermore, we criticize the analogy put forward by Flint between involuntary transport of poultry and long-distance migration.

Summary

To expect ultimate answers and unequivocal policy guidance from null hypothesis testing puts unrealistic expectations on a flawed approach to statistical inference and on science in general.  相似文献   

8.
Ryman N  Jorde PE 《Molecular ecology》2001,10(10):2361-2373
A variety of statistical procedures are commonly employed when testing for genetic differentiation. In a typical situation two or more samples of individuals have been genotyped at several gene loci by molecular or biochemical means, and in a first step a statistical test for allele frequency homogeneity is performed at each locus separately, using, e.g. the contingency chi-square test, Fisher's exact test, or some modification thereof. In a second step the results from the separate tests are combined for evaluation of the joint null hypothesis that there is no allele frequency difference at any locus, corresponding to the important case where the samples would be regarded as drawn from the same statistical and, hence, biological population. Presently, there are two conceptually different strategies in use for testing the joint null hypothesis of no difference at any locus. One approach is based on the summation of chi-square statistics over loci. Another method is employed by investigators applying the Bonferroni technique (adjusting the P-value required for rejection to account for the elevated alpha errors when performing multiple tests simultaneously) to test if the heterogeneity observed at any particular locus can be regarded significant when considered separately. Under this approach the joint null hypothesis is rejected if one or more of the component single locus tests is considered significant under the Bonferroni criterion. We used computer simulations to evaluate the statistical power and realized alpha errors of these strategies when evaluating the joint hypothesis after scoring multiple loci. We find that the 'extended' Bonferroni approach generally is associated with low statistical power and should not be applied in the current setting. Further, and contrary to what might be expected, we find that 'exact' tests typically behave poorly when combined in existing procedures for joint hypothesis testing. Thus, while exact tests are generally to be preferred over approximate ones when testing each particular locus, approximate tests such as the traditional chi-square seem preferable when addressing the joint hypothesis.  相似文献   

9.
Although a large body of work investigating tests of correlated evolution of two continuous characters exists, hypotheses such as character displacement are really tests of whether substantial evolutionary change has occurred on a particular branch or branches of the phylogenetic tree. In this study, we present a methodology for testing such a hypothesis using ancestral character state reconstruction and simulation. Furthermore, we suggest how to investigate the robustness of the hypothesis test by varying the reconstruction methods or simulation parameters. As a case study, we tested a hypothesis of character displacement in body size of Caribbean Anolis lizards. We compared squared-change, weighted squared-change, and linear parsimony reconstruction methods, gradual Brownian motion and speciational models of evolution, and several resolution methods for linear parsimony. We used ancestor reconstruction methods to infer the amount of body size evolution, and tested whether evolutionary change in body size was greater on branches of the phylogenetic tree in which a transition from occupying a single-species island to a two-species island occurred. Simulations were used to generate null distributions of reconstructed body size change. The hypothesis of character displacement was tested using Wilcoxon Rank-Sums. When tested against simulated null distributions, all of the reconstruction methods resulted in more significant P-values than when standard statistical tables were used. These results confirm that P-values for tests using ancestor reconstruction methods should be assessed via simulation rather than from standard statistical tables. Linear parsimony can produce an infinite number of most parsimonious reconstructions in continuous characters. We present an example of assessing the robustness of our statistical test by exploring the sample space of possible resolutions. We compare ACCTRAN and DELTRAN resolutions of ambiguous character reconstructions in linear parsimony to the most and least conservative resolutions for our particular hypothesis.  相似文献   

10.
There are a number of nonparametric procedures known for testing goodness-of-fit in the univariate case. Similar procedures can be derived for testing goodness-of-fit in the multivariate case through an application of the theory of statistically equivalent blocks (SEB). The SEB transforms the data into coverages which are distributed as spacings from a uniform distribution on [0,1], under the null hypothesis. In this paper, we present a multivariate nonparametric test of goodness-of-fit based on the SEB when the multivariate distributions under the null hypothesis and the alternative hypothesis are “weakly” ordered. Empirical results are given on the performance of the proposed test in an application to the problem of assessing the reliability of a p-component system.  相似文献   

11.
The study of male genital diversity has long overshadowed evolutionary inquiry of female genitalia, despite its nontrivial diversity. Here, we identify four nonmutually exclusive mechanisms that could lead to genital divergence in females, and potentially generate patterns of correlated male–female genital evolution: (1) ecological variation alters the context of sexual selection (“ecology hypothesis”), (2) sexually antagonistic selection (“sexual‐conflict hypothesis”), (3) female preferences for male genitalia mediated by female genital traits (“female‐choice hypothesis”), and (4) selection against inter‐population mating (“lock‐and‐key hypothesis”). We performed an empirical investigation of all four hypotheses using the model system of Bahamas mosquitofish inhabiting blue holes that vary in predation risk. We found unequivocal support for the ecology hypothesis, with females exhibiting a smaller genital opening in blue holes containing piscivorous fish. This is consistent with stronger postmating female choice/conflict when predators are present, but greater premating female choice in their absence. Our results additionally supported the lock‐and‐key hypothesis, uncovering a pattern of reproductive character displacement for genital shape. We found no support for the sexual conflict or female choice hypotheses. Our results demonstrate a strong role for ecology in generating female genital diversity, and suggest that lock‐and‐key may provide a viable cause of female genital diversification.  相似文献   

12.
Summary We consider a problem of testing mixture proportions using two‐sample data, one from group one and the other from a mixture of groups one and two with unknown proportion, λ, for being in group two. Various statistical applications, including microarray study, infectious epidemiological studies, case–control studies with contaminated controls, clinical trials allowing “nonresponders,” genetic studies for gene mutation, and fishery applications can be formulated in this setup. Under the assumption that the log ratio of probability (density) functions from the two groups is linear in the observations, we propose a generalized score test statistic to test the mixture proportion. Under some regularity conditions, it is shown that this statistic converges to a weighted chi‐squared random variable under the null hypothesis of λ= 0 , where the weight depends only on the sampling fraction of both groups. The permutation method is used to provide more reliable finite sample approximation. Simulation results and two real data applications are presented.  相似文献   

13.
Restoration practitioners adopt a multiplicity of approaches that range from basic trial and error, and site‐specific efforts, to complex experimental designs that test cutting edge theoretical hypotheses. We classify these different strategies to understand how restoration is planned and executed, and to contribute to the discussion on certification and evaluation. We use Aldo Leopold's notion of “intelligent tinkering” as a basis for a typology of four different approaches to restoration based on four parameters: motivation, general strategy, method of inquiry, and temporal and spatial scales of the expected outcomes. We argue that efforts to restore a damaged ecosystem in a skilled and experimental manner should be called “professional intelligent tinkering” versus “amateur intelligent tinkering,” and “careless tinkering.” We compare these three types of tinkering, and a more formal “scientific approach.” In professional intelligent tinkering, interventions and adjustments are done in a logical and careful manner, and with a methodical, experimental mindset. In contrast to the scientific approach, intelligent tinkering does not necessarily follow a formal experimental procedure, with replications and controls that allow extrapolation, nor is it driven by the motivation to publish in peer‐reviewed journals. Rather, it is primarily driven by a desire to solve site‐specific problems even in the absence of sufficient ecological knowledge to apply previously tested knowledge and techniques. We illustrate three approaches with three on‐going restoration projects in southeastern Brazil, two of which are small scale, and one of which is very large scale.  相似文献   

14.
The comparison of parasite numbers or intensities between different samples of hosts is a common and important question in most parasitological studies. The main question is whether the values in one sample tend to be higher (or lower) than the values of the other sample. We argue that it is more appropriate to test a null hypothesis about the probability that an individual host from one sample has a higher value than individual hosts from a second sample rather than testing hypotheses about means or medians. We present a recently proposed statistical test especially designed to test hypotheses about that probability. This novel test is more appropriate than other statistical tests, such as Student's t-test, the Mann-Whitney U-test, or a bootstrap test based on Welch's t-statistic, regularly used by parasitologists.  相似文献   

15.
One of the underlying assumptions of both theoretical and empirical community ecology is that the processes determining community composition and abundance of species are interactions specific to particular pairs of species. However, we argue that, in sessile plants at least, competitive interactions are not usually species-specific and that there exists a large degree of equivalence of the effect of species of similar growth form on the ability of any particular species to establish within a community. This null hypothesis of equivalence of competitive effects is based on three characteristics of plants: homogeneity of resource requirements among autotrophs; low encounter probabilities between individuals of any particular species pair; and the predominance of size asymmetries between competing individuals (e.g., seedling-adult interactions.) We present an experimental design to quantify competitive interactions among plant species under field conditions and therefore enable statistical comparisons of competitive abilities among species. The competitive effect of one “neighbor” species on one “target” species is measured as the slope of a regression of performance of target individuals on biomass (or other measure of amount) of its immediate neighbors. Use of the design to test for equivalence of competitive effects and other advantages are described.  相似文献   

16.
Experimental tests of clearly articulated hypotheses are an increasingly widespread feature of modern marine ecology. Increased use of experiments has not, however, been accompanied by increased understanding of the logical structure of falsificationist tests. Most observations can be explained by several different models or theories. To distinguish among these requires demonstration of the falsity of the consequences or predictions of incorrect models. This is best achieved by deriving from each model one or more hypotheses (predictions) about the type, form or nature of observations that should occur in some not-yet-examined set of circumstances. Because of logical constraints on the possibility of proving the correctness of such hypotheses, they must be inverted to form logical null hypotheses which comprise all alternative possibilities to those predicted in the hypotheses. Correctness or not of null hypotheses can then be ascertained by an appropriately designed experiment (or test), leading to unambiguous rejection or retention of the null hypotheses. The former corroborates the hypotheses and provides support for the correctness of the explanatory model for the original observations. In contrast, retention of a null hypothesis identifies an incorrect model. The growth of knowledge is thus the elimination of false models, theories and explanations. Ecological experiments usually require statistical procedures for determining whether or not null hypotheses should be retained. Construction of statistical null hypotheses (i.e. definitions of parameters of frequency distributions of test statistics) sometimes requires that these be identical to logical hypotheses (and not to the logical nulls). This leads to irrational acceptance of hypotheses and the models or theories from which they were derived. It also poses immense problems for determinations of statistical power of experiments. Ecological experiments are analysed to reveal the nature of, and linkages between, their components in relation to falsificationism, statistical procedures and the logical properties and interpretations of ecological theories.  相似文献   

17.
In clinical trials for the comparison of two treatments it seems reasonable to stop the study if either one treatment has worked out to be markedly superior in the main effect, or one to be severely inferior with respect to an adverse side effect. Two stage sampling plans are considered for simultaneously testing a main and side effect, assumed to follow a bivariate normal distribution with known variances, but unknown correlation. The test procedure keeps the global significance level under the null hypothesis of no differences in main and side effects. The critical values are chosen under the side condition, that the probability for ending at the first or second stage with a rejection of the elementary null hypothesis for the main effect is controlled, when a particular constellation of differences in mean holds; analogously the probability of ending with a rejection of the null hypotheses for the side effect, given certain treatment differences, is controlled too. Plans “optimal” with respect to sample size are given.  相似文献   

18.
Benjamini Y  Heller R 《Biometrics》2008,64(4):1215-1222
SUMMARY: We consider the problem of testing for partial conjunction of hypothesis, which argues that at least u out of n tested hypotheses are false. It offers an in-between approach to the testing of the conjunction of null hypotheses against the alternative that at least one is not, and the testing of the disjunction of null hypotheses against the alternative that all hypotheses are not null. We suggest powerful test statistics for testing such a partial conjunction hypothesis that are valid under dependence between the test statistics as well as under independence. We then address the problem of testing many partial conjunction hypotheses simultaneously using the false discovery rate (FDR) approach. We prove that if the FDR controlling procedure in Benjamini and Hochberg (1995, Journal of the Royal Statistical Society, Series B 57, 289-300) is used for this purpose the FDR is controlled under various dependency structures. Moreover, we can screen at all levels simultaneously in order to display the findings on a superimposed map and still control an appropriate FDR measure. We apply the method to examples from microarray analysis and functional magnetic resonance imaging (fMRI), two application areas where the need for partial conjunction analysis has been identified.  相似文献   

19.
Much forensic inference based upon DNA evidence is made assuming that the Hardy-Weinberg equilibrium (HWE) is valid for the genetic loci being used. Several statistical tests to detect and measure deviation from HWE have been devised, each having advantages and limitations. The limitations become more obvious when testing for deviation within multiallelic DNA loci is attempted. Here we present an exact test for HWE in the biallelic case, based on the ratio of weighted likelihoods under the null and alternative hypotheses, the Bayes factor. This test does not depend on asymptotic results and minimizes a linear combination of type I and type II errors. By ordering the sample space using the Bayes factor, we also define a significance (evidence) index, P value, using the weighted likelihood under the null hypothesis. We compare it to the conditional exact test for the case of sample size n = 10. Using the idea under the method of chi(2) partition, the test is used sequentially to test equilibrium in the multiple allele case and then applied to two short tandem repeat loci, using a real Caucasian data bank, showing its usefulness.  相似文献   

20.
Signal detection in functional magnetic resonance imaging (fMRI) inherently involves the problem of testing a large number of hypotheses. A popular strategy to address this multiplicity is the control of the false discovery rate (FDR). In this work we consider the case where prior knowledge is available to partition the set of all hypotheses into disjoint subsets or families, e. g., by a-priori knowledge on the functionality of certain regions of interest. If the proportion of true null hypotheses differs between families, this structural information can be used to increase statistical power. We propose a two-stage multiple test procedure which first excludes those families from the analysis for which there is no strong evidence for containing true alternatives. We show control of the family-wise error rate at this first stage of testing. Then, at the second stage, we proceed to test the hypotheses within each non-excluded family and obtain asymptotic control of the FDR within each family at this second stage. Our main mathematical result is that this two-stage strategy implies asymptotic control of the FDR with respect to all hypotheses. In simulations we demonstrate the increased power of this new procedure in comparison with established procedures in situations with highly unbalanced families. Finally, we apply the proposed method to simulated and to real fMRI data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号