首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Null hypothesis significance testing has been under attack in recent years, partly owing to the arbitrary nature of setting α (the decision-making threshold and probability of Type I error) at a constant value, usually 0.05. If the goal of null hypothesis testing is to present conclusions in which we have the highest possible confidence, then the only logical decision-making threshold is the value that minimizes the probability (or occasionally, cost) of making errors. Setting α to minimize the combination of Type I and Type II error at a critical effect size can easily be accomplished for traditional statistical tests by calculating the α associated with the minimum average of α and β at the critical effect size. This technique also has the flexibility to incorporate prior probabilities of null and alternate hypotheses and/or relative costs of Type I and Type II errors, if known. Using an optimal α results in stronger scientific inferences because it estimates and minimizes both Type I errors and relevant Type II errors for a test. It also results in greater transparency concerning assumptions about relevant effect size(s) and the relative costs of Type I and II errors. By contrast, the use of α = 0.05 results in arbitrary decisions about what effect sizes will likely be considered significant, if real, and results in arbitrary amounts of Type II error for meaningful potential effect sizes. We cannot identify a rationale for continuing to arbitrarily use α = 0.05 for null hypothesis significance tests in any field, when it is possible to determine an optimal α.  相似文献   

2.
Managers and policy makers depend on empirical research to guide and support biosecurity measures that mitigate introduced species’ impacts. Research contributing to this knowledge base generally uses null hypothesis significance testing to determine the significance of data patterns. However, reliance on traditional statistical significance testing methods, combined with small effect and sample size and large variability inherent to many impact studies, may obscure effects on native species, communities or ecosystems. This may result in false certainty of no impact. We investigated potential Type II error rates and effect sizes for 31 non-significant empirical evaluations of impact for introduced algal and crustacean species. We found low power consistently led to acceptance of Type II errors at rates 5.6–19 times greater than Type I errors (despite moderate to large effect sizes). Our results suggest that introduced species for which impact studies have statistically non-significant outcomes (often interpreted as “no impact”) may potentially have large impacts that are missed due to small sample or effect sizes and/or high variation. This alarming willingness to “miss” impacts has severe implications for conservation efforts, including under-managing species’ impacts and discounting the costs of Type II errors.  相似文献   

3.
This paper presents a look at the underused procedure of testing for Type II errors when "negative" results are encountered during research. It recommends setting a statistical alternative hypothesis based on anthropologically derived information and calculating the probability of committing this type of error. In this manner, the process is similar to that used for testing Type I errors, which is clarified by examples from the literature. It is hoped that researchers will use the information presented here as a means of attaching levels of probability to acceptance of null hypotheses.  相似文献   

4.
Permutation test is a popular technique for testing a hypothesis of no effect, when the distribution of the test statistic is unknown. To test the equality of two means, a permutation test might use a test statistic which is the difference of the two sample means in the univariate case. In the multivariate case, it might use a test statistic which is the maximum of the univariate test statistics. A permutation test then estimates the null distribution of the test statistic by permuting the observations between the two samples. We will show that, for such tests, if the two distributions are not identical (as for example when they have unequal variances, correlations or skewness), then a permutation test for equality of means based on difference of sample means can have an inflated Type I error rate even when the means are equal. Our results illustrate permutation testing should be confined to testing for non-identical distributions. CONTACT: calian@raunvis.hi.is.  相似文献   

5.
Strug LJ  Hodge SE 《Human heredity》2006,61(4):200-209
The 'multiple testing problem' currently bedevils the field of genetic epidemiology. Briefly stated, this problem arises with the performance of more than one statistical test and results in an increased probability of committing at least one Type I error. The accepted/conventional way of dealing with this problem is based on the classical Neyman-Pearson statistical paradigm and involves adjusting one's error probabilities. This adjustment is, however, problematic because in the process of doing that, one is also adjusting one's measure of evidence. Investigators have actually become wary of looking at their data, for fear of having to adjust the strength of the evidence they observed at a given locus on the genome every time they conduct an additional test. In a companion paper in this issue (Strug & Hodge I), we presented an alternative statistical paradigm, the 'evidential paradigm', to be used when planning and evaluating linkage studies. The evidential paradigm uses the lod score as the measure of evidence (as opposed to a p value), and provides new, alternatively defined error probabilities (alternative to Type I and Type II error rates). We showed how this paradigm separates or decouples the two concepts of error probabilities and strength of the evidence. In the current paper we apply the evidential paradigm to the multiple testing problem - specifically, multiple testing in the context of linkage analysis. We advocate using the lod score as the sole measure of the strength of evidence; we then derive the corresponding probabilities of being misled by the data under different multiple testing scenarios. We distinguish two situations: performing multiple tests of a single hypothesis, vs. performing a single test of multiple hypotheses. For the first situation the probability of being misled remains small regardless of the number of times one tests the single hypothesis, as we show. For the second situation, we provide a rigorous argument outlining how replication samples themselves (analyzed in conjunction with the original sample) constitute appropriate adjustments for conducting multiple hypothesis tests on a data set.  相似文献   

6.
Post DM 《Oecologia》2007,153(4):973-984
Understanding and explaining the causes of variation in food-chain length is a fundamental challenge for community ecology. The productive-space hypothesis, which suggests food-chain length is determined by the combination of local resource availability and ecosystem size, is central to this challenge. Two different approaches currently exist for testing the productive-space hypothesis: (1) the dual gradient approach that tests for significant relationships between food-chain length and separate gradients of ecosystem size (e.g., lake volume) and per-unit-size resource availability (e.g., g C m−1 year−2), and (2) the single gradient approach that tests for a significant relationship between food-chain length and the productive space (product of ecosystem size and per-unit-size resource availability). Here I evaluate the efficacy of the two approaches for testing the productive-space hypothesis. Using simulated data sets, I estimate the Type 1 and Type 2 error rates for single and dual gradient models in recovering a known relationship between food-chain length and ecosystem size, resource availability, or the combination of ecosystem size and resource ability, as specified by the productive-space hypothesis. The single gradient model provided high power (low Type 2 error rates) but had a very high Type 1 error rate, often erroneously supporting the productive-space hypothesis. The dual gradient model had a very low Type 1 error rate but suffered from low power to detect an effect of per-unit-size resource availability because the range of variation in resource availability is limited. Finally, I performed a retrospective power analysis for the Post et al. (Nature 405:1047–1049, 2000) data set, which tested and rejected the productive-space hypothesis using the dual gradient approach. I found that Post et al. (Nature 405:1047–1049, 2000) had sufficient power to reject the productive-space hypothesis in north temperate lakes; however, the productive-space hypothesis must be tested in other ecosystems before its generality can be fully addressed.  相似文献   

7.
The central challenge from the Precautionary Principle to statistical methodology is to help delineate (preferably quantitatively) the possibility that some exposure is hazardous, even in cases where this is not established beyond reasonable doubt. The classical approach to hypothesis testing is unhelpful, because lack of significance can be due either to uninformative data or to genuine lack of effect (the Type II error problem). Its inversion, bioequivalence testing, might sometimes be a model for the Precautionary Principle in its ability to ‘prove the null hypothesis.’ Current procedures for setting safe exposure levels are essentially derived from these classical statistical ideas, and we outline how uncertainties in the exposure and response measurements affect the No Observed Adverse Effect Level (NOAEL), the Benchmark approach and the “Hockey Stick” model. A particular problem concerns model uncertainty: usually these procedures assume that the class of models describing dose/response is known with certainty; this assumption is however often violated, perhaps particularly often when epidemiological data form the source of the risk assessment, and regulatory authorities have occasionally resorted to some average based on competing models. The recent methodology of Bayesian model averaging might be a systematic version of this, but is this an arena for the Precautionary Principle to come into play?  相似文献   

8.
EST clustering error evaluation and correction   总被引:4,自引:0,他引:4  
MOTIVATION: The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated. RESULTS: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is <1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is approximately 10 times higher than the 3' EST case (30% versus 3%). An over-stringent identity rule, e.g., P >/= 95%, may even inflate the Type I error in both cases. We demonstrate that approximately 80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.  相似文献   

9.
In natural populations of Drosophila melanogaster, sigma virus is usually present in a minority of individuals. The virus is transmitted transovarially but is not contagious from fly to fly. Two viral Types (I and II) are found in populations. One of them (Type II) is better adapted to an allele for resistance to the virus, present as a polymorphism in fly populations. Previous observations have led to the hypothesis that a viral Type II originating in central France might be invading populations. The study of Languedoc populations was undertaken to examine this hypothesis. Two striking phenomena were observed. The strong increase in Type II clones frequency (from 0.53 to 0.91) confirmed that there was invasion in this region. The frequency of infected flies also increased dramatically, at levels never observed elsewhere yet, which indicates that Languedoc should present some unusual characteristics. The epidemiological consequences of such a burst, in the case of a pathogenic virus would have to be taken into consideration. Significant changes in other viral characteristics, from 1983 to 1987, in Languedoc populations have also been documented.  相似文献   

10.
In any formal statistical test of the null hypothesis (the statement that a population parameter is equal to a specific value), there are two possible types of error. Type 1 or alpha error has occurred if the investigator rejects the null hypothesis when it is true. For example, an experimental treatment is declared an advance over standard treatment when it is not. Type 2 or beta error has occurred if the null hypothesis is not rejected when it is false. In this case, the investigator concludes that the experimental treatment is no different than the standard when it actually is. The two types of error can be conceptualized, respectively, as the consumer's risk and the producer's risk. In many reports of clinical trial methodology, it is the producer's risk that is emphasized. It is understandable why producer's risk would be of concern to authors of clinical studies. There are, however, numerous potential sources of consumer's risk. It is the latter type of risk that is the primary subject of this report.  相似文献   

11.
Binomial tests are commonly used in sensory difference and preference testing under the assumptions that choices are independent and choice probabilities do not vary from trial to trial. This paper addresses violations of the latter assumption (often referred to as overdispersion) and accounts for variation in inter-trial choice probabilities following the Beta distribution. Such variation could arise as a result of differences in test substrate from trial to trial, differences in sensory acuity among subjects or the existence of latent preference segments. In fact, it is likely that overdispersion occurs ubiquitously in product testing. Using the Binomial model for data in which there is inter-trial variation may lead to seriously misleading conclusions from a sensory difference or preference test. A simulation study in this paper based on product testing experience showed that when using a Binomial model for overdispersed Binomial data, Type I error may be 0.44 for a Binomial test specification corresponding to a level of 0.05. Underestimation of Type I error using the Binomial model may seriously undermine legal claims of product superiority in situations where overdispersion occurs. The Beta-Binomial (BB) model, an extension of the Binomial distribution, was developed to fit overdispersed Binomial data. Procedures for estimating and testing the parameters as well as testing for goodness of fit are discussed. Procedures for determining sample size and for calculating estimate precision and test power based on the BB model are given. Numerical examples and simulation results are also given in the paper. The BB model should improve the validity of sensory difference and preference testing.  相似文献   

12.

Introduction

Statistical interactions are a common component of data analysis across a broad range of scientific disciplines. However, the statistical power to detect interactions is often undesirably low. One solution is to elevate the Type 1 error rate so that important interactions are not missed in a low power situation. To date, no study has quantified the effects of this practice on power in a linear regression model.

Methods

A Monte Carlo simulation study was performed. A continuous dependent variable was specified, along with three types of interactions: continuous variable by continuous variable; continuous by dichotomous; and dichotomous by dichotomous. For each of the three scenarios, the interaction effect sizes, sample sizes, and Type 1 error rate were varied, resulting in a total of 240 unique simulations.

Results

In general, power to detect the interaction effect was either so low or so high at α = 0.05 that raising the Type 1 error rate only served to increase the probability of including a spurious interaction in the model. A small number of scenarios were identified in which an elevated Type 1 error rate may be justified.

Conclusions

Routinely elevating Type 1 error rate when testing interaction effects is not an advisable practice. Researchers are best served by positing interaction effects a priori and accounting for them when conducting sample size calculations.  相似文献   

13.
THE POWER OF SENSORY DISCRIMINATION METHODS   总被引:8,自引:1,他引:7  
Difference testing methods are extensively used in a variety of applications from small sensory evaluation tests to large scale consumer tests. A central issue in the use of these tests is their statistical power, or the probability that if a specified difference exists it will be demonstrated as a significant difference in a difference test. A general equation for the power of any discrimination method is given. A general equation for the sample size required to meet Type I and Type II error specifications is also given. Sample size tables for the 2-alternative forced choice (2-AFC), 3-AFC, the duo-trio and the triangular methods are given. Tables of the psychometric functions for the 2-AFC, 3-AFC, triangular and duo-trio methods are also given.  相似文献   

14.
Bivariate line-fitting methods for allometry   总被引:14,自引:0,他引:14  
Fitting a line to a bivariate dataset can be a deceptively complex problem, and there has been much debate on this issue in the literature. In this review, we describe for the practitioner the essential features of line-fitting methods for estimating the relationship between two variables: what methods are commonly used, which method should be used when, and how to make inferences from these lines to answer common research questions. A particularly important point for line-fitting in allometry is that usually, two sources of error are present (which we call measurement and equation error), and these have quite different implications for choice of line-fitting method. As a consequence, the approach in this review and the methods presented have subtle but important differences from previous reviews in the biology literature. Linear regression, major axis and standardised major axis are alternative methods that can be appropriate when there is no measurement error. When there is measurement error, this often needs to be estimated and used to adjust the variance terms in formulae for line-fitting. We also review line-fitting methods for phylogenetic analyses. Methods of inference are described for the line-fitting techniques discussed in this paper. The types of inference considered here are testing if the slope or elevation equals a given value, constructing confidence intervals for the slope or elevation, comparing several slopes or elevations, and testing for shift along the axis amongst several groups. In some cases several methods have been proposed in the literature. These are discussed and compared. In other cases there is little or no previous guidance available in the literature. Simulations were conducted to check whether the methods of inference proposed have the intended coverage probability or Type I error. We identified the methods of inference that perform well and recommend the techniques that should be adopted in future work.  相似文献   

15.
《The Journal of cell biology》1993,120(6):1439-1448
I have produced a new monoclonal antibody, YF-169, against membrane ruffle specific 55-kD protein. YF-169 stained membrane ruffles of chick embryo fibroblasts so definitely that it enabled clear and reliable analyses of membrane ruffles. Fibroblasts organized two distinct types of membrane ruffles. One type of the ruffles were transiently formed in serum-starved cells (Type I) when stimulated by serum or platelet- derived growth factor. After spontaneous degradation of Type I ruffles, the other type of ruffles containing many microspikes were gradually organized at leading edges (Type II). The formation of Type I ruffles was not affected by either nocodazole, a microtubule-disrupting drug, or taxol, a microtubule-stabilizing reagent. However, Type II ruffles were entirely destroyed not only by nocodazole but also by taxol, suggesting that regulated organization of microtubule network is important to maintain Type II ruffles. H8, a protein kinase inhibitor prevented the spontaneous degradation of Type I ruffles and also reduced the destructive effect of nocodazole on Type II ruffles without affecting microtubule-disrupting activity. Protein kinases may be involved in the degradation processes of both types of ruffles. W7, a calmodulin antagonist, strongly inhibited Type I ruffle formation and completely destroyed Type II ruffles. W7 was also found to induce a remarkable change of 55-kD protein localization. After degradation of Type II ruffles, most of 55-kD protein was incorporated into newly formed unusual thick fibers. These results suggest that regulated organization of microtubule network is not necessary to form Type I ruffles but is important to maintain Type II ruffles, while calmodulin function is essential for both types of membrane ruffles.  相似文献   

16.
Preference testing is commonly used in consumer sensory evaluation. Traditionally, it is done without replication, effectively leading to a single 0/1 (binary) measurement on each panelist. However, to understand the nature of the preference, replicated preference tests are a better approach, resulting in binomial counts of preferences on each panelist. Variability among panelists then leads to overdispersion of the counts when the binomial model is used and to an inflated Type I error rate for statistical tests of preference. Overdispersion can be adjusted by Pearson correction or by other models such as correlated binomial or beta‐binomial. Several methods are suggested or reviewed in this study for analyzing replicated preference tests and their Type I error rates and power are compared. Simulation studies show that all methods have reasonable Type I error rates and similar power. Among them, the binomial model with Pearson adjustment is probably the safest way to analyze replicated preference tests, while a normal model in which the binomial distribution is not assumed is the easiest.  相似文献   

17.
We examined Type I error rates of Felsenstein's (1985; Am. Nat. 125:1-15) comparative method of phylogenetically independent contrasts when branch lengths are in error and the model of evolution is not Brownian motion. We used seven evolutionary models, six of which depart strongly from Brownian motion, to simulate the evolution of two continuously valued characters along two different phylogenies (15 and 49 species). First, we examined the performance of independent contrasts when branch lengths are distorted systematically, for example, by taking the square root of each branch segment. These distortions often caused inflated Type I error rates, but performance was almost always restored when branch length transformations were used. Next, we investigated effects of random errors in branch lengths. After the data were simulated, we added errors to the branch lengths and then used the altered phylogenies to estimate character correlations. Errors in the branches could be of two types: fixed, where branch lengths are either shortened or lengthened by a fixed fraction; or variable, where the error is a normal variate with mean zero and the variance is scaled to the length of the branch (so that expected error relative to branch length is constant for the whole tree). Thus, the error added is unrelated to the microevolutionary model. Without branch length checks and transformations, independent contrasts tended to yield extremely inflated and highly variable Type I error rates. Type I error rates were reduced, however, when branch lengths were checked and transformed as proposed by Garland et al. (1992; Syst. Biol. 41:18-32), and almost never exceeded twice the nominal P-value at alpha = 0.05. Our results also indicate that, if branch length transformations are applied, then the appropriate degrees of freedom for testing the significance of a correlation coefficient should, in general, be reduced to account for estimation of the best branch length transformation. These results extend those reported in Díaz-Uriarte and Garland (1996; Syst. Biol. 45:27-47), and show that, even with errors in branch lengths and evolutionary models different from Brownian motion, independent contrasts are a robust method for testing hypotheses of correlated evolution.  相似文献   

18.
As phylogenetically controlled experimental designs become increasingly common in ecology, the need arises for a standardized statistical treatment of these datasets. Phylogenetically paired designs circumvent the need for resolved phylogenies and have been used to compare species groups, particularly in the areas of invasion biology and adaptation. Despite the widespread use of this approach, the statistical analysis of paired designs has not been critically evaluated. We propose a mixed model approach that includes random effects for pair and species. These random effects introduce a “two-layer” compound symmetry variance structure that captures both the correlations between observations on related species within a pair as well as the correlations between the repeated measurements within species. We conducted a simulation study to assess the effect of model misspecification on Type I and II error rates. We also provide an illustrative example with data containing taxonomically similar species and several outcome variables of interest. We found that a mixed model with species and pair as random effects performed better in these phylogenetically explicit simulations than two commonly used reference models (no or single random effect) by optimizing Type I error rates and power. The proposed mixed model produces acceptable Type I and II error rates despite the absence of a phylogenetic tree. This design can be generalized to a variety of datasets to analyze repeated measurements in clusters of related subjects/species.  相似文献   

19.
Recordings of naturally occurring Electromyographic (EMG) signalsare variable. One of the first formal and successful attemptsto quantify variation in EMG signals was Shaffer and Lauder's(1985) study examining several levels of variation but not withinmuscle. The goal of the current study was to quantify the variationthat exists at different levels, using more detailed measuresof EMG activity than did Shaffer and Lauder (1985). The importanceof accounting for different levels of variation in an EMG studyis both biological and statistical. Signal variation withinthe same muscle for a stereotyped action suggests that eachrecording represents a sample drawn from a pool of a large numberof motor units that, while biologically functioning in an integratedfashion, showed statistical variation. Different levels of variationfor different muscles could be related to different functionsor different tasks of those muscles. The statistical impactof unaccounted or inappropriately analyzed variation can leadto false rejection (type I error) or false acceptance (typeII error) of the null hypothesis. Type II errors occur becausesuch variation will accrue to the error, reducing power, andproducing an artificially low F-value. Type I errors are associatedwith pseudoreplication, in which the replicated units are nottruly independent, thereby leading to inflated degrees of freedom,and an underestimate of the error mean square. To address theseproblems, we used a repeated measures, nested multifactor modelto measure the relative contribution of different hierarchicallevels of variation to the total variation in EMG signals duringswallowing. We found that variation at all levels, among electrodesin the same muscle, in sequences of the same animal, and amongindividuals and between differently named muscles, was significant.These findings suggest that a single intramuscular electrode,recording from a limited sample of the motor units, cannot berelied upon to characterize the activity of an entire muscle.Furthermore, the use of both a repeated-measures model, to avoidpseudoreplication, and a nested model, to account for variation,is critical for a correct testing of biological hypotheses aboutdifferences in EMG signals.  相似文献   

20.
Javidpour P  Korman TP  Shakya G  Tsai SC 《Biochemistry》2011,50(21):4638-4649
Type II polyketides include antibiotics such as tetracycline and chemotherapeutics such as daunorubicin. Type II polyketides are biosynthesized by the type II polyketide synthase (PKS) that consists of 5-10 stand-alone domains. In many type II PKSs, the type II ketoreductase (KR) specifically reduces the C9-carbonyl group. How the type II KR achieves such a high regiospecificity and the nature of stereospecificity are not well understood. Sequence alignment of KRs led to a hypothesis that a well-conserved 94-XGG-96 motif may be involved in controlling the stereochemistry. The stereospecificity of single-, double-, and triple-mutant combinations of P94L, G95D, and G96D were analyzed in vitro and in vivo for the actinorhodin KR (actKR). The P94L mutation is sufficient to change the stereospecificity of actKR. Binary and ternary crystal structures of both wild-type and P94L actKR were determined. Together with assay results, docking simulations, and cocrystal structures, a model for stereochemical control is presented herein that elucidates how type II polyketides are introduced into the substrate pocket such that the C9-carbonyl can be reduced with high regio- and stereospecificities. The molecular features of actKR important for regio- and stereospecificities can potentially be applied in biosynthesizing new polyketides via protein engineering that rationally controls polyketide keto reduction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号