首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Currently, among multiple comparison procedures for dependent groups, a bootstrap‐t with a 20% trimmed mean performs relatively well in terms of both Type I error probabilities and power. However, trimmed means suffer from two general concerns described in the paper. Robust M‐estimators address these concerns, but now no method has been found that gives good control over the probability of a Type I error when sample sizes are small. The paper suggests using instead a modified one‐step M‐estimator that retains the advantages of both trimmed means and robust M‐estimators. Yet another concern is that the more successful methods for trimmed means can be too conservative in terms of Type I errors. Two methods for performing all pairwise multiple comparisons are considered. In simulations, both methods avoid a familywise error (FWE) rate larger than the nominal level. The method based on comparing measures of location associated with the marginal distributions can have an actual FWE that is well below the nominal level when variables are highly correlated. However, the method based on difference scores performs reasonably well with very small sample sizes, and it generally performs better than any of the methods studied in Wilcox (1997b).  相似文献   

2.
G O Zerbe  J R Murphy 《Biometrics》1986,42(4):795-804
Two multiple-comparisons procedures are suggested for supplementing randomization analysis of growth and response curves. One controls the experimentwise Type I error rate for all possible contrast curves via an extension of the Scheffé method. The other controls a family of Type I error rates via a stepwise testing procedure. Both can be approximated by standard F tests without costly recomputation of all of the test statistics for a large number of permutations.  相似文献   

3.
Computer simulation techniques were used to investigate the Type I and Type II error rates of one parametric (Dunnett) and two nonparametric multiple comparison procedures for comparing treatments with a control under nonnormality and variance homogeneity. It was found that Dunnett's procedure is quite robust with respect to violations of the normality assumption. Power comparisons show that for small sample sizes Dunnett's procedure is superior to the nonparametric procedures also in non-normal cases, but for larger sample sizes the multiple analogue to Wilcoxon and Kruskal-Wallis rank statistics are superior to Dunnett's procedure in all considered nonnormal cases. Further investigations under nonnormality and variance heterogeneity show robustness properties with respect to the risks of first kind and power comparisons yield similar results as in the equal variance case.  相似文献   

4.
A well known result is that skewness can cause problems when testing hypotheses about measures of location, particulary when a one-sided test is of interest. Wilcox (1994) reports both theoretical and simulation results showing that when testing hypotheses about trimmed means, control over Type I error probabilities can be substantially better than methods for means. However, at least in some situations, control over the probability of a Type I error might still be judged to be inadequate. One way of adressing this concern is to combine trimmed means with the bootsrap method advocated by Westfall and Yuong (1993). This note reports simulation results indicating that there are situations where substantial improvements over Type I error probabilities are indeed obtained.  相似文献   

5.
This article proposes resampling-based empirical Bayes multiple testing procedures for controlling a broad class of Type I error rates, defined as generalized tail probability (gTP) error rates, gTP (q,g) = Pr(g (V(n),S(n)) > q), and generalized expected value (gEV) error rates, gEV (g) = E [g (V(n),S(n))], for arbitrary functions g (V(n),S(n)) of the numbers of false positives V(n) and true positives S(n). Of particular interest are error rates based on the proportion g (V(n),S(n)) = V(n) /(V(n) + S(n)) of Type I errors among the rejected hypotheses, such as the false discovery rate (FDR), FDR = E [V(n) /(V(n) + S(n))]. The proposed procedures offer several advantages over existing methods. They provide Type I error control for general data generating distributions, with arbitrary dependence structures among variables. Gains in power are achieved by deriving rejection regions based on guessed sets of true null hypotheses and null test statistics randomly sampled from joint distributions that account for the dependence structure of the data. The Type I error and power properties of an FDR-controlling version of the resampling-based empirical Bayes approach are investigated and compared to those of widely-used FDR-controlling linear step-up procedures in a simulation study. The Type I error and power trade-off achieved by the empirical Bayes procedures under a variety of testing scenarios allows this approach to be competitive with or outperform the Storey and Tibshirani (2003) linear step-up procedure, as an alternative to the classical Benjamini and Hochberg (1995) procedure.  相似文献   

6.
Strug LJ  Hodge SE 《Human heredity》2006,61(4):200-209
The 'multiple testing problem' currently bedevils the field of genetic epidemiology. Briefly stated, this problem arises with the performance of more than one statistical test and results in an increased probability of committing at least one Type I error. The accepted/conventional way of dealing with this problem is based on the classical Neyman-Pearson statistical paradigm and involves adjusting one's error probabilities. This adjustment is, however, problematic because in the process of doing that, one is also adjusting one's measure of evidence. Investigators have actually become wary of looking at their data, for fear of having to adjust the strength of the evidence they observed at a given locus on the genome every time they conduct an additional test. In a companion paper in this issue (Strug & Hodge I), we presented an alternative statistical paradigm, the 'evidential paradigm', to be used when planning and evaluating linkage studies. The evidential paradigm uses the lod score as the measure of evidence (as opposed to a p value), and provides new, alternatively defined error probabilities (alternative to Type I and Type II error rates). We showed how this paradigm separates or decouples the two concepts of error probabilities and strength of the evidence. In the current paper we apply the evidential paradigm to the multiple testing problem - specifically, multiple testing in the context of linkage analysis. We advocate using the lod score as the sole measure of the strength of evidence; we then derive the corresponding probabilities of being misled by the data under different multiple testing scenarios. We distinguish two situations: performing multiple tests of a single hypothesis, vs. performing a single test of multiple hypotheses. For the first situation the probability of being misled remains small regardless of the number of times one tests the single hypothesis, as we show. For the second situation, we provide a rigorous argument outlining how replication samples themselves (analyzed in conjunction with the original sample) constitute appropriate adjustments for conducting multiple hypothesis tests on a data set.  相似文献   

7.
This report explores how the heterogeneity of variances affects randomization tests used to evaluate differences in the asymptotic population growth rate, λ. The probability of Type I error was calculated in four scenarios for populations with identical λ but different variance of λ: (1) Populations have different projection matrices: the same λ may be obtained from different sets of vital rates, which gives room for different variances of λ. (2) Populations have identical projection matrices but reproductive schemes differ and fecundity in one of the populations has a larger associated variance. The two other scenarios evaluate a sampling artifact as responsible for heterogeneity of variances. The same population is sampled twice, (3) with the same sampling design, or (4) with different sampling effort for different stages. Randomization tests were done with increasing differences in sample size between the two populations. This implies additional differences in the variance of λ. The probability of Type I error keeps at the nominal significance level (α = .05) in Scenario 3 and with identical sample sizes in the others. Tests were too liberal, or conservative, under a combination of variance heterogeneity and different sample sizes. Increased differences in sample size exacerbated the difference between observed Type I error and the nominal significance level. Type I error increases or decreases depending on which population has a larger sample size, the population with the smallest or the largest variance. However, by their own, sample size is not responsible for changes in Type I errors.  相似文献   

8.
Null hypothesis significance testing has been under attack in recent years, partly owing to the arbitrary nature of setting α (the decision-making threshold and probability of Type I error) at a constant value, usually 0.05. If the goal of null hypothesis testing is to present conclusions in which we have the highest possible confidence, then the only logical decision-making threshold is the value that minimizes the probability (or occasionally, cost) of making errors. Setting α to minimize the combination of Type I and Type II error at a critical effect size can easily be accomplished for traditional statistical tests by calculating the α associated with the minimum average of α and β at the critical effect size. This technique also has the flexibility to incorporate prior probabilities of null and alternate hypotheses and/or relative costs of Type I and Type II errors, if known. Using an optimal α results in stronger scientific inferences because it estimates and minimizes both Type I errors and relevant Type II errors for a test. It also results in greater transparency concerning assumptions about relevant effect size(s) and the relative costs of Type I and II errors. By contrast, the use of α = 0.05 results in arbitrary decisions about what effect sizes will likely be considered significant, if real, and results in arbitrary amounts of Type II error for meaningful potential effect sizes. We cannot identify a rationale for continuing to arbitrarily use α = 0.05 for null hypothesis significance tests in any field, when it is possible to determine an optimal α.  相似文献   

9.
R L Kodell  J J Chen 《Biometrics》1991,47(1):139-146
A method is proposed for classifying various experimental outcomes associated with statistically significant trend tests according to a set of sequential testing within a family of closed (under intersections) one-sided tests. The intent of the procedure is to characterize the general shape of implied dose-response relationships, taking care neither to inflate the false-positive (Type I) error rate by overtesting, nor to sacrifice power by overadjusting for multiple comparisons.  相似文献   

10.
We study a two-type, age-dependent branching process in which the branching probabilities of one of the types may vary with time. Specifically this modification of the Bellman-Harris process starts with a Type I particle which may either die or change to a Type II particle depending upon a time varying probability. A Type II particle may either die or reproduce with fixed probabilities but may not return to a particle of Type I. In this way the process models the lag phenomenon observed in microbe growth subsequent to transfer to a new culture medium while the organism is adapting to its new environment. We show that if the mean reproduction rate of Type II particles exceeds 1, then the population size grows exponentially. Further the extinction probability for this process is related to that of the Bellman-Harris process. Finally the governing equations are solved for several choices of the growth parameters and the solutions are graphically displayed showing that a wide variety of behavior can be modeled by this process.  相似文献   

11.
An alternative to frequentist approaches to multiple comparisons is Duncan's k-ratio Bayes rule approach. The purpose of this paper is to compile key results on k-ratio Bayes rules for a number of multiple comparison problems that heretofore, have only been available in separate papers or doctoral dissertations. Among other problems, multiple comparisons for means in one-way, two-way, and treatments-vs.-control structures will be reviewed. In the k-ratio approach, the optimal joint rule for a multiple comparisons problem is derived under the assumptions of additive losses and prior exchangeability for the component comparisons. In the component loss function for a comparison, a balance is achieved between the decision losses due to Type I and Type II errors by assuming that their ratio is k. The component loss is also linear in the magnitude of the error. Under the assumption of additive losses, the joint Bayes rule for the component comparisons applies to each comparison the Bayes test for that comparison considered alone. That is, a comparisonwise approach is optimal. However, under prior exchangeability of the comparisons, the component test critical regions adapt to omnibus patterns in the data. For example, for a balanced one-way array of normally distributed means, the Bayes critical t value for a difference between means is inversely related to the F ratio measuring heterogeneity among the means, resembling a continuous version of Fisher's F-protected least significant difference rule. For more complicated treatment structures, the Bayes critical t value for a difference depends intuitively on multiple F ratios and marginal difference(s) (if applicable), such that the critical t value warranted for the difference can range from being as conservative as that given by a familywise rule to actually being anti-conservative relative to that given by the unadjusted 5%-level Student's t test.  相似文献   

12.
Two common goals when choosing a method for performing all pairwise comparisons of J independent groups are controlling experiment wise Type I error and maximizing power. Typically groups are compared in terms of their means, but it has been known for over 30 years that the power of these methods becomes highly unsatisfactory under slight departures from normality toward heavy-tailed distributions. An approach to this problem, well-known in the statistical literature, is to replace the sample mean with a measure of location having a standard error that is relatively unaffected by heavy tails and outliers. One possibility is to use the trimmed mean. This paper describes three such multiple comparison procedures and compares them to two methods for comparing means.  相似文献   

13.
Genotypes produced from samples collected non-invasively in harsh field conditions often lack the full complement of data from the selected microsatellite loci. The application to genetic mark-recapture methodology in wildlife species can therefore be prone to misidentifications leading to both ‘true non-recaptures’ being falsely accepted as recaptures (Type I errors) and ‘true recaptures’ being undetected (Type II errors). Here we present a new likelihood method that allows every pairwise genotype comparison to be evaluated independently. We apply this method to determine the total number of recaptures by estimating and optimising the balance between Type I errors and Type II errors. We show through simulation that the standard error of recapture estimates can be minimised through our algorithms. Interestingly, the precision of our recapture estimates actually improved when we included individuals with missing genotypes, as this increased the number of pairwise comparisons potentially uncovering more recaptures. Simulations suggest that the method is tolerant to per locus error rates of up to 5% per locus and can theoretically work in datasets with as little as 60% of loci genotyped. Our methods can be implemented in datasets where standard mismatch analyses fail to distinguish recaptures. Finally, we show that by assigning a low Type I error rate to our matching algorithms we can generate a dataset of individuals of known capture histories that is suitable for the downstream analysis with traditional mark-recapture methods.  相似文献   

14.
This paper presents a look at the underused procedure of testing for Type II errors when "negative" results are encountered during research. It recommends setting a statistical alternative hypothesis based on anthropologically derived information and calculating the probability of committing this type of error. In this manner, the process is similar to that used for testing Type I errors, which is clarified by examples from the literature. It is hoped that researchers will use the information presented here as a means of attaching levels of probability to acceptance of null hypotheses.  相似文献   

15.
EST clustering error evaluation and correction   总被引:4,自引:0,他引:4  
MOTIVATION: The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated. RESULTS: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is <1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is approximately 10 times higher than the 3' EST case (30% versus 3%). An over-stringent identity rule, e.g., P >/= 95%, may even inflate the Type I error in both cases. We demonstrate that approximately 80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.  相似文献   

16.
This paper proposes a procedure for testing and classifying data with multiple factors. A two-way analysis of covariance is used to classify the differences among the batches as well as another factor such as package type and/or product strength. In the test procedure, slopes and intercepts of the main effects are tested using a combination of simultaneous and sequential F-tests. Based on the test procedure results, the data are classified into one of four different groups. For each group, shelf life can be calculated accordingly. We examine if the procedure produces satisfactory control of the probability of a Type I error and the power of detecting the difference of degradation rates and intercepts for different nominal levels. The method is evaluated with a Monte Carlo simulation study. The proposed procedure is compared with the current FDA procedure using real data.  相似文献   

17.
Community assembly rules are often inferred from patterns in presence-absence matrices. A challenging problem in the analysis of presence-absence matrices has been to devise a null model algorithm to produce random matrices with fixed row and column sums. Previous studies by Roberts and Stone [(1990) Oecologia 83:560-567] and Manly [(1995) Ecology 76:1109-1115] used a "Sequential Swap" algorithm in which submatrices are repeatedly swapped to produce null matrices. Sanderson et al. [(1998) Oecologia 116:275-283] introduced a "Knight's Tour" algorithm that fills an empty matrix one cell at a time. In an analysis of the presence-absence matrix for birds of the Vanuatu islands, Sanderson et al. obtained different results from Roberts and Stone and concluded that "results from previous studies are generally flawed". However, Sanderson et al. did not investigate the statistical properties of their algorithm. Using simple probability calculations, we demonstrate that their Knight's Tour is biased and does not sample all unique matrices with equal frequency. The bias in the Knight's Tour arises because the algorithm samples exhaustively at each step before retreating in sequence. We introduce an unbiased Random Knight's Tour that tests only a small number of cells and retreats by removing a filled cell from anywhere in the matrix. This algorithm appears to sample unique matrices with equal frequency. The Random Knight's Tour and Sequential Swap algorithms generate very similar results for the large Vanuatu matrix, and for other presence-absence matrices we tested. As a further test of the Sequential Swap, we constructed a set of 100 random matrices derived from the Vanuatu matrix, analyzed them with the Sequential Swap, and found no evidence that the algorithm is prone to Type I errors (rejecting the null hypothesis too frequently). These results support the original conclusions of Roberts and Stone and are consistent with Gotelli's [(2000) Ecology 81:2606-2621] Type I and Type II error tests for the Sequential Swap. In summary, Sanderson et al.'s Knight's Tourgenerates large variances and does not sample matrices equiprobably. In contrast, the Sequential Swap generates results that are very similar to those of an unbiased Random Knight's Tour, and is not overly prone to Type I or Type II errors. We suggest that the statistical properties of proposed null model algorithms be examined carefully, and that their performance judged by comparisons with artificial data sets of known structure. In this way, Type I and Type II error frequencies can be quantified, and different algorithms and indices can be compared meaningfully.  相似文献   

18.
THE POWER OF SENSORY DISCRIMINATION METHODS   总被引:8,自引:1,他引:7  
Difference testing methods are extensively used in a variety of applications from small sensory evaluation tests to large scale consumer tests. A central issue in the use of these tests is their statistical power, or the probability that if a specified difference exists it will be demonstrated as a significant difference in a difference test. A general equation for the power of any discrimination method is given. A general equation for the sample size required to meet Type I and Type II error specifications is also given. Sample size tables for the 2-alternative forced choice (2-AFC), 3-AFC, the duo-trio and the triangular methods are given. Tables of the psychometric functions for the 2-AFC, 3-AFC, triangular and duo-trio methods are also given.  相似文献   

19.
Taxon sampling, correlated evolution, and independent contrasts   总被引:14,自引:0,他引:14  
Independent contrasts are widely used to incorporate phylogenetic information into studies of continuous traits, particularly analyses of evolutionary trait correlations, but the effects of taxon sampling on these analyses have received little attention. In this paper, simulations were used to investigate the effects of taxon sampling patterns and alternative branch length assignments on the statistical performance of correlation coefficients and sign tests; "full-tree" analyses based on contrasts at all nodes and "paired-comparisons" based only on contrasts of terminal taxon pairs were also compared. The simulations showed that random samples, with respect to the traits under consideration, provide statistically robust estimates of trait correlations. However, exact significance tests are highly dependent on appropriate branch length information; equal branch lengths maintain lower Type I error than alternative topological approaches, and adjusted critical values of the independent contrast correlation coefficient are provided for use with equal branch lengths. Nonrandom samples, with respect to univariate or bivariate trait distributions, introduce discrepancies between interspecific and phylogenetically structured analyses and bias estimates of underlying evolutionary correlations. Examples of nonrandom sampling processes may include community assembly processes, convergent evolution under local adaptive pressures, selection of a nonrandom sample of species from a habitat or life-history group, or investigator bias. Correlation analyses based on species pairs comparisons, while ignoring deeper relationships, entail significant loss of statistical power and as a result provide a conservative test of trait associations. Paired comparisons in which species differ by a large amount in one trait, a method introduced in comparative plant ecology, have appropriate Type I error rates and high statistical power, but do not correctly estimate the magnitude of trait correlations. Sign tests, based on full-tree or paired-comparison approaches, are highly reliable across a wide range of sampling scenarios, in terms of Type I error rates, but have very low power. These results provide guidance for selecting species and applying comparative methods to optimize the performance of statistical tests of trait associations.  相似文献   

20.
Brookmeyer R  You X 《Biometrics》2006,62(1):61-65
The objective of this article is to develop a hypothesis-testing procedure to determine whether a common source outbreak has ended. We consider the case when neither the calendar date of exposure to the pathogen nor the exact incubation period distribution is known. The hypothesis-testing procedure is based on the spacings between ordered calendar dates of disease onset of the cases. A simulation study was performed to evaluate the robustness of the methods to various models for the incubation period of infectious diseases. We investigated the impact of multiple testing on the overall outbreak-wise type I error probability. We derive expressions for the outbreak-wise type I error probability and show that multiple testing has minimal effect on inflating that error probability. The results are discussed in the context of the 2001 U.S. anthrax outbreak.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号