期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

P > .05: The incorrect interpretation of “not significant” results is a significant problem

Richard J. Smith 《American journal of physical anthropology》2020,172(4):521-527

Statistically nonsignificant (p > .05) results from a null hypothesis significance test (NHST) are often mistakenly interpreted as evidence that the null hypothesis is true—that there is “no effect” or “no difference.” However, many of these results occur because the study had low statistical power to detect an effect. Power below 50% is common, in which case a result of no statistical significance is more likely to be incorrect than correct. The inference of “no effect” is not valid even if power is high. NHST assumes that the null hypothesis is true; p is the probability of the data under the assumption that there is no effect. A statistical test cannot confirm what it assumes. These incorrect statistical inferences could be eliminated if decisions based on p values were replaced by a biological evaluation of effect sizes and their confidence intervals. For a single study, the observed effect size is the best estimate of the population effect size, regardless of the p value. Unlike p values, confidence intervals provide information about the precision of the observed effect. In the biomedical and pharmacology literature, methods have been developed to evaluate whether effects are “equivalent,” rather than zero, as tested with NHST. These methods could be used by biological anthropologists to evaluate the presence or absence of meaningful biological effects. Most of what appears to be known about no difference or no effect between sexes, between populations, between treatments, and other circumstances in the biological anthropology literature is based on invalid statistical inference. 相似文献

2.

A New Bayesian Two-Sample t Test and Solution to the Behrens–Fisher Problem Based on Gaussian Mixture Modelling with Known Allocations

Kelter Riko 《Statistics in biosciences》2022,14(3):380-412

Testing differences between a treatment and control group is common practice in biomedical research like randomized controlled trials (RCT). The standard two-sample t test relies on null hypothesis significance testing (NHST) via p values, which has several drawbacks. Bayesian alternatives were recently introduced using the Bayes factor, which has its own limitations. This paper introduces an alternative to current Bayesian two-sample t tests by interpreting the underlying model as a two-component Gaussian mixture in which the effect size is the quantity of interest, which is most relevant in clinical research. Unlike p values or the Bayes factor, the proposed method focusses on estimation under uncertainty instead of explicit hypothesis testing. Therefore, via a Gibbs sampler, the posterior of the effect size is produced, which is used subsequently for either estimation under uncertainty or explicit hypothesis testing based on the region of practical equivalence (ROPE). An illustrative example, theoretical results and a simulation study show the usefulness of the proposed method, and the test is made available in the R package bayest. In sum, the new Bayesian two-sample t test provides a solution to the Behrens–Fisher problem based on Gaussian mixture modelling.

相似文献

3.

Tests for Genetic Differentiation

Markus Neuhuser 《Biometrical journal. Biometrische Zeitschrift》2003,45(8):974-984

When testing for genetic differentiation the joint null hypothesis that there is no allele frequency difference at any locus is of interest. Common approaches to test this hypothesis are based on the summation of χ² statistics over loci and on the Bonferroni correction, respectively. Here, we also consider the Simes adjustment and a recently proposed truncated product method (TPM) to combine P‐values. The summation and the TPM (using a relatively large truncation point) are powerful when there are differences in many or all loci. The Simes adjustment, however, is powerful when there are differences regarding one or a few loci only. As a compromise between the different approaches we introduce a combination between the Simes adjustment and the TPM, i.e. the joint null hypothesis is rejected if at least one of the two methods, Simes and TPM, is significant at the α/2‐level. Simulation results indicate that this combination is a robust procedure with high power over the different types of alternatives. 相似文献

4.

生态学假说试验验证的原假说困境 总被引：1，自引：1，他引：0

李际《生态学杂志》2016,27(6):2031-2038

试验方法是生态学假说的主要验证方法之一,但也存在由原假说引发的质疑.Quinn和Dunham(1983)通过对Platt(1964)的假说-演绎模型进行分析,主张生态学不可能存在可以严格被试验验证的原假说.Fisher的证伪主义与Neyman-Pearson(N-P)的非判决性使得统计学原假说不能被严格验证;而生态过程中存在的不同于经典物理学的原假说H₀(α=1,β=0)与不同的备假说H₁′(α′=1,β′=0)的情况,使得生态学原假说也很难得到严格的实验验证.通过降低P值、谨慎选择原假说、对非原假说采取非中心化和双侧验证可分别缓解上述的原假说困境.但统计学的原假说显著性验证(NHST)不应等同于生态学假说中有关因果关系的逻辑证明方法.因此,现有大量基于NHST的生态学假说的方法研究和试验验证的结果与结论都不是绝对的逻辑可靠的. 相似文献

5.

Testing for Treatment Effects on Subsets of Endpoints

James J. Chen Sue‐Jane Wang 《Biometrical journal. Biometrische Zeitschrift》2002,44(5):541-557

Multiple endpoints are tested to assess an overall treatment effect and also to identify which endpoints or subsets of endpoints contributed to treatment differences. The conventional p‐value adjustment methods, such as single‐step, step‐up, or step‐down procedures, sequentially identify each significant individual endpoint. Closed test procedures can also detect individual endpoints that have effects via a step‐by‐step closed strategy. This paper proposes a global‐based statistic for testing an a priori number, say, r of the k endpoints, as opposed to the conventional approach of testing one (r = 1) endpoint. The proposed test statistic is an extension of the single‐step p‐value‐based statistic based on the distribution of the smallest p‐value. The test maintains strong control of the FamilyWise Error (FWE) rate under the null hypothesis of no difference in any (sub)set of r endpoints among all possible combinations of the k endpoints. After rejecting the null hypothesis, the individual endpoints in the sets that are rejected can be tested further, using a univariate test statistic in a second step, if desired. However, the second step test only weakly controls the FWE. The proposed method is illustrated by application to a psychosis data set. 相似文献

6.

Sample Size Calculations for the Mean in a Two Component Nonstandard Mixture Distribution

Haitao Chu Shiquan Ren Stephen R. Cole 《Biometrical journal. Biometrische Zeitschrift》2004,46(5):565-571

We are concerned with calculating the sample size required for estimating the mean of the continuous distribution in the context of a two component nonstandard mixture distribution (i.e., a mixture of an identifiable point degenerate function F at a constant with probability P and a continuous distribution G with probability 1 – P). A common ad hoc procedure of escalating the naïve sample size n (calculated under the assumption of no point degenerate function F) by a factor of 1/(1 – P), has about 0.5 probability of achieving the pre‐specified statistical power. Such an ad hoc approach may seriously underestimate the necessary sample size and jeopardize inferences in scientific investigations. We argue that sample size calculations in this context should have a pre‐specified probability of power ≥1 – β set by the researcher at a level greater than 0.5. To that end, we propose an exact method and an approximate method to calculate sample size in this context so that the pre‐specified probability of achieving a desired statistical power is determined by the researcher. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

7.

Fuzzy p-values in latent variable problems

Thompson Elizabeth A.; Geyer Charles J. 《Biometrika》2007,94(1):49-60

We consider the problem of testing a statistical hypothesiswhere the scientifically meaningful test statistic is a functionof latent variables. In particular, we consider detection ofgenetic linkage, where the latent variables are patterns ofinheritance at specific genome locations. Introduced by Geyer& Meeden (2005), fuzzy p-values are random variables, describedby their probability distributions, that are interpreted asp-values. For latent variable problems, we introduce the notionof a fuzzy p-value as having the conditional distribution ofthe latent p-value given the observed data, where the latentp-value is the random variable that would be the p-value ifthe latent variables were observed. The fuzzy p-value provides an exact test using two sets of simulationsof the latent variables under the null hypothesis, one unconditionaland the other conditional on the observed data. It providesnot only an expression of the strength of the evidence againstthe null hypothesis but also an expression of the uncertaintyin that expression owing to lack of knowledge of the latentvariables. We illustrate these features with an example of simulateddata mimicking a real example of the detection of genetic linkage. 相似文献

8.

Conditional Rejection Probabilities of Student's t‐test and Design Adaptations

Martin Posch Nina Timmesfeld Franz Knig Hans‐Helge Müller 《Biometrical journal. Biometrische Zeitschrift》2004,46(4):389-403

For clinical trials with interim analyses conditional rejection probabilities play an important role when stochastic curtailment or design adaptations are performed. The conditional rejection probability gives the conditional probability to finally reject the null hypothesis given the interim data. It is computed either under the null or the alternative hypothesis. We investigate the properties of the conditional rejection probability for the one sided, one sample t‐test and show that it can be non monotone in the interim mean of the data and non monotone in the non‐centrality parameter for the alternative. We give several proposals how to implement design adaptations (that are based on the conditional rejection probability) for the t‐test and give a numerical example. Additionally, the conditional rejection probability given the interim t‐statistic is investigated. It does not depend on the unknown σ and can be used in stochastic curtailment procedures. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim) 相似文献

9.

TESTING FOR UNEQUAL AMOUNTS OF EVOLUTION IN A CONTINUOUS CHARACTER ON DIFFERENT BRANCHES OF A PHYLOGENETIC TREE USING LINEAR AND SQUARED-CHANGE PARSIMONY: AN EXAMPLE USING LESSER ANTILLEAN ANOLIS LIZARDS

Marguerite A. Butler Jonathan B. Losos 《Evolution; international journal of organic evolution》1997,51(5):1623-1635

Although a large body of work investigating tests of correlated evolution of two continuous characters exists, hypotheses such as character displacement are really tests of whether substantial evolutionary change has occurred on a particular branch or branches of the phylogenetic tree. In this study, we present a methodology for testing such a hypothesis using ancestral character state reconstruction and simulation. Furthermore, we suggest how to investigate the robustness of the hypothesis test by varying the reconstruction methods or simulation parameters. As a case study, we tested a hypothesis of character displacement in body size of Caribbean Anolis lizards. We compared squared-change, weighted squared-change, and linear parsimony reconstruction methods, gradual Brownian motion and speciational models of evolution, and several resolution methods for linear parsimony. We used ancestor reconstruction methods to infer the amount of body size evolution, and tested whether evolutionary change in body size was greater on branches of the phylogenetic tree in which a transition from occupying a single-species island to a two-species island occurred. Simulations were used to generate null distributions of reconstructed body size change. The hypothesis of character displacement was tested using Wilcoxon Rank-Sums. When tested against simulated null distributions, all of the reconstruction methods resulted in more significant P-values than when standard statistical tables were used. These results confirm that P-values for tests using ancestor reconstruction methods should be assessed via simulation rather than from standard statistical tables. Linear parsimony can produce an infinite number of most parsimonious reconstructions in continuous characters. We present an example of assessing the robustness of our statistical test by exploring the sample space of possible resolutions. We compare ACCTRAN and DELTRAN resolutions of ambiguous character reconstructions in linear parsimony to the most and least conservative resolutions for our particular hypothesis. 相似文献

10.

Pitfalls of hypothesis tests and model selection on bootstrap samples: Causes and consequences in biometrical applications

下载免费PDF全文

Silke Janitza Harald Binder Anne‐Laure Boulesteix 《Biometrical journal. Biometrische Zeitschrift》2016,58(3):447-473

The bootstrap method has become a widely used tool applied in diverse areas where results based on asymptotic theory are scarce. It can be applied, for example, for assessing the variance of a statistic, a quantile of interest or for significance testing by resampling from the null hypothesis. Recently, some approaches have been proposed in the biometrical field where hypothesis testing or model selection is performed on a bootstrap sample as if it were the original sample. P‐values computed from bootstrap samples have been used, for example, in the statistics and bioinformatics literature for ranking genes with respect to their differential expression, for estimating the variability of p‐values and for model stability investigations. Procedures which make use of bootstrapped information criteria are often applied in model stability investigations and model averaging approaches as well as when estimating the error of model selection procedures which involve tuning parameters. From the literature, however, there is evidence that p‐values and model selection criteria evaluated on bootstrap data sets do not represent what would be obtained on the original data or new data drawn from the overall population. We explain the reasons for this and, through the use of a real data set and simulations, we assess the practical impact on procedures relevant to biometrical applications in cases where it has not yet been studied. Moreover, we investigate the behavior of subsampling (i.e., drawing from a data set without replacement) as a potential alternative solution to the bootstrap for these procedures. 相似文献

11.

Microsatellite variation and population genetic structure of the red throat emperor on the Great Barrier Reef 总被引：3，自引：0，他引：3

L. van Herwerden†‡ J. Benzie†§ C. Davies¶ 《Journal of fish biology》2003,62(5):987-999

Analysis (using three analytical approaches) of eight microsatellite markers from six locations in three geographic regions of the Great Barrier Reef (GBR), including populations that differed in demographic characteristics, showed no evidence of genetic stock structure in the red throat emperor Lethrinus miniatus. Measures of inter‐population differentiation were non‐significant (P ≥ 0·67). Using a Bayesian clustering approach, ‘admixture’ was detected (mean alpha values >1) with allele frequencies for each of the locations sampled being correlated equally with allele frequencies from all locations sampled. The number of populations (K) identified was one, based on the estimates of the probability of the data at various K values (K = 1, 2, 3, … 6). Additionally, alpha values did not stabilize to relatively constant values in any of the Bayesian analyses performed, indicating that there was no real genetic structure between locations. Analysis of genetic variation as detected by analysis of molecular variance (AMOVA) indicated that almost all of the variance in the data (99·74%, P ≤ 0·023) was within populations, rather than among populations (0·15%, P ≤ 0·176) or amongst regions sampled (0·10%, P ≤ 0·247) on the GBR. F_st statistics identified four individual loci having statistically significant differentiation among populations, but these were only related to one out of 12 pair‐wise comparisons where populations differed demographically. Given these results (albeit using neutral markers), together with the capacity of adults and larvae to be mobile between reefs on the inter‐connected GBR, it is considered unlikely that L. miniatus populations exist as distinct genetic stocks in the GBR. It is therefore not possible, using neutral markers, to reject the null hypothesis that the fishery be managed as a single panmictic stock. 相似文献

12.

Adaptive designs with arbitrary dependence structure

下载免费PDF全文

Rene Schmidt Andreas Faldum Olaf Witt Joachim Gerß 《Biometrical journal. Biometrische Zeitschrift》2014,56(1):86-106

Adaptive designs were originally developed for independent and uniformly distributed p‐values. There are trial settings where independence is not satisfied or where it may not be possible to check whether it is satisfied. In these cases, the test statistics and p‐values of each stage may be dependent. Since the probability of a type I error for a fixed adaptive design depends on the true dependence structure between the p‐values of the stages, control of the type I error rate might be endangered if the dependence structure is not taken into account adequately. In this paper, we address the problem of controlling the type I error rate in two‐stage adaptive designs if any dependence structure between the test statistics of the stages is admitted (worst case scenario). For this purpose, we pursue a copula approach to adaptive designs. For two‐stage adaptive designs without futility stop, we derive the probability of a type I error in the worst case, that is for the most adverse dependence structure between the p‐values of the stages. Explicit analytical considerations are performed for the class of inverse normal designs. A comparison with the significance level for independent and uniformly distributed p‐values is performed. For inverse normal designs without futility stop and equally weighted stages, it turns out that correcting for the worst case is too conservative as compared to a simple Bonferroni design. 相似文献

13.

A test for spatial relationships between neighbouring plants in plots of heterogeneous plant density

Pierre Couteron Josiane Seghieri Joël Chad&#x;uf 《植被学杂志》2003,14(2):163-172

Abstract. Maps of plant individuals in (x, y) coordinates (i.e. point patterns) are currently analysed through statistical methods assuming a homogeneous distribution of points, and thus a constant density within the study area. Such an assumption is seldom met at the scale of a field plot whilst delineating less heterogeneous subplots is not always easy or pertinent. In this paper we advocate local tests carried out in quadrats partitioning the plot and having a size objectively determined via a trade‐off between squared bias and variance. In each quadrat, the observed pattern of points is tested against complete spatial randomness (CSR) through a classical Monte‐Carlo approach and one of the usual statistics. Local tests yield maps of p‐values that are amenable to diversified subsequent analyses, such as computation of a variogram or comparison with co‐variates. Another possibility uses the frequency distribution of p‐values to test the whole point pattern against the null hypothesis of an inhomogeneous Poisson process. The method was demonstrated by considering computer‐generated inhomoge‐neous point patterns as well as maps of woody individuals in banded vegetation (tiger bush) in semi‐arid West Africa. Local tests proved able to properly depict spatial relationships between neighbours in spite of heterogeneity/clustering at larger scales. The method is also relevant to investigate interaction between density and spatial pattern in the presence of resource gradients. 相似文献

14.

Persistent Controversy in Statistical Approaches in Wildlife Sciences: A Perspective of Students

JERROD A. BUTCHER JULIE E. GROCE CHRISTOPHER M. LITUMA M. CONSTANZA COCIMANO YARA SÁNCHEZ-JOHNSON ANDREW J. CAMPOMIZZI THERESA L. POPE KELLY S. REYNA ANNA C. S. KNIPPS 《The Journal of wildlife management》2007,71(7):2142-2144

ABSTRACT The controversy over the use of null hypothesis statistical testing (NHST) has persisted for decades, yet NHST remains the most widely used statistical approach in wildlife sciences and ecology. A disconnect exists between those opposing NHST and many wildlife scientists and ecologists who conduct and publish research. This disconnect causes confusion and frustration on the part of students. We, as students, offer our perspective on how this issue may be addressed. Our objective is to encourage academic institutions and advisors of undergraduate and graduate students to introduce students to various statistical approaches so we can make well-informed decisions on the appropriate use of statistical tools in wildlife and ecological research projects. We propose an academic course that introduces students to various statistical approaches (e.g., Bayesian, frequentist, Fisherian, information theory) to build a foundation for critical thinking in applying statistics. We encourage academic advisors to become familiar with the statistical approaches available to wildlife scientists and ecologists and thus decrease bias towards one approach. Null hypothesis statistical testing is likely to persist as the most common statistical analysis tool in wildlife science until academic institutions and student advisors change their approach and emphasize a wider range of statistical methods. 相似文献

15.

Assessing Equivalence Tests with Respect to their Expected p‐Value

Rafael Pflüger Torsten Hothorn 《Biometrical journal. Biometrische Zeitschrift》2002,44(8):1015-1027

Monte‐Carlo simulation methods are commonly used for assessing the performance of statistical tests under finite sample scenarios. They help us ascertain the nominal level for tests with approximate level, e.g. asymptotic tests. Additionally, a simulation can assess the quality of a test on the alternative. The latter can be used to compare new tests and established tests under certain assumptions in order to determinate a preferable test given characteristics of the data. The key problem for such investigations is the choice of a goodness criterion. We expand the expected p‐value as considered by Sackrowitz and Samuel‐Cahn (1999) to the context of univariate equivalence tests. This presents an effective tool to evaluate new purposes for equivalence testing because of its independence of the distribution of the test statistic under null‐hypothesis. It helps to avoid the often tedious search for the distribution under null‐hypothesis for test statistics which have no considerable advantage over yet available methods. To demonstrate the usefulness in biometry a comparison of established equivalence tests with a nonparametric approach is conducted in a simulation study for three distributional assumptions. 相似文献

16.

Hardy-Weinberg equilibrium diagnostics 总被引：3，自引：0，他引：3

Rogatko A Slifker MJ Babb JS 《Theoretical population biology》2002,62(3):251-257

We propose two diagnostics for the statistical assessment of Hardy-Weinberg equilibrium. One diagnostic is the posterior probability of the complement of the smallest highest posterior density credible region that includes points in the parameter space consistent with the hypothesis of equilibrium. The null hypothesis of equilibrium is to be rejected if this probability is less than a pre-selected critical level. The second diagnostic is the proportion of the parameter space occupied by the highest posterior density credible region associated with the critical level. These Bayesian diagnostics can be interpreted as analogues of the classical types I and II error probabilities. They are broadly applicable: they can be computed for any hypothesis test, using samples of any size generated according to any distribution. 相似文献

17.

In support of null hypothesis significance testing

Mogie M 《Proceedings. Biological sciences / The Royal Society》2004,271(Z3):S82-S84

Many criticisms have been levelled at null hypothesis significance testing (NHST). It is argued here that although there is reason to doubt that data subjected only to NHST have been subjected to sufficient analysis, the search for clear answers to well-formulated questions derived from substantive hypotheses is well served by NHST. To reliably draw inferences from data, however, NHST may need to be complemented by additional methods of analysis, such as the use of confidence intervals and of estimates of the degree of association between independent and dependent variables. It is argued that these should be seen as complements of, rather than as substitutes for, NHST since they do not directly test the strength of evidence against a null hypothesis. 相似文献

18.

Detecting and avoiding likely false‐positive findings – a practical guide

下载免费PDF全文

Wolfgang Forstmeier Eric‐Jan Wagenmakers Timothy H. Parker 《Biological reviews of the Cambridge Philosophical Society》2017,92(4):1941-1968

Recently there has been a growing concern that many published research findings do not hold up in attempts to replicate them. We argue that this problem may originate from a culture of ‘you can publish if you found a significant effect’. This culture creates a systematic bias against the null hypothesis which renders meta‐analyses questionable and may even lead to a situation where hypotheses become difficult to falsify. In order to pinpoint the sources of error and possible solutions, we review current scientific practices with regard to their effect on the probability of drawing a false‐positive conclusion. We explain why the proportion of published false‐positive findings is expected to increase with (i) decreasing sample size, (ii) increasing pursuit of novelty, (iii) various forms of multiple testing and researcher flexibility, and (iv) incorrect P‐values, especially due to unaccounted pseudoreplication, i.e. the non‐independence of data points (clustered data). We provide examples showing how statistical pitfalls and psychological traps lead to conclusions that are biased and unreliable, and we show how these mistakes can be avoided. Ultimately, we hope to contribute to a culture of ‘you can publish if your study is rigorous’. To this end, we highlight promising strategies towards making science more objective. Specifically, we enthusiastically encourage scientists to preregister their studies (including a priori hypotheses and complete analysis plans), to blind observers to treatment groups during data collection and analysis, and unconditionally to report all results. Also, we advocate reallocating some efforts away from seeking novelty and discovery and towards replicating important research findings of one's own and of others for the benefit of the scientific community as a whole. We believe these efforts will be aided by a shift in evaluation criteria away from the current system which values metrics of ‘impact’ almost exclusively and towards a system which explicitly values indices of scientific rigour. 相似文献

19.

Evaluation of Noether's Method of Sample Size Determination for the Wilcoxon-Mann-Whitney Test

Rüdiger Vollandt Manfred Horn 《Biometrical journal. Biometrische Zeitschrift》1997,39(7):823-829

NOETHER (1987) proposed a method of sample size determination for the Wilcoxon-Mann-Whitney test. To obtain a sample size formula, he restricted himself to alternatives that differ only slightly from the null hypothesis, so that the unknown variance o² of the Mann-Whitney statistic can be approximated by the known variance under the null hypothesis which depends only on n. This fact is frequently forgotten in statistical practice. In this paper, we compare Noether's large sample solution against an alternative approach based on upper bounds of σ² which is valid for any alternatives. This comparison shows that Noether's approximation is sufficiently reliable with small and large deviations from the null hypothesis. 相似文献

20.

Exploring the information in p-values for the analysis and planning of multiple-test experiments

Ruppert D Nettleton D Hwang JT 《Biometrics》2007,63(2):483-495

Summary A new methodology is proposed for estimating the proportion of true null hypotheses in a large collection of tests. Each test concerns a single parameter δ whose value is specified by the null hypothesis. We combine a parametric model for the conditional cumulative distribution function (CDF) of the p‐value given δ with a nonparametric spline model for the density g(δ) of δ under the alternative hypothesis. The proportion of true null hypotheses and the coefficients in the spline model are estimated by penalized least squares subject to constraints that guarantee that the spline is a density. The estimator is computed efficiently using quadratic programming. Our methodology produces an estimate of the density of δ when the null is false and can address such questions as “when the null is false, is the parameter usually close to the null or far away?” This leads us to define a falsely interesting discovery rate (FIDR), a generalization of the false discovery rate. We contrast the FIDR approach to Efron's (2004, Journal of the American Statistical Association 99, 96–104) empirical null hypothesis technique. We discuss the use of in sample size calculations based on the expected discovery rate (EDR). Our recommended estimator of the proportion of true nulls has less bias compared to estimators based upon the marginal density of the p‐values at 1. In a simulation study, we compare our estimators to the convex, decreasing estimator of Langaas, Lindqvist, and Ferkingstad (2005, Journal of the Royal Statistical Society, Series B 67, 555–572). The most biased of our estimators is very similar in performance to the convex, decreasing estimator. As an illustration, we analyze differences in gene expression between resistant and susceptible strains of barley. 相似文献