首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
The two‐sided Simes test is known to control the type I error rate with bivariate normal test statistics. For one‐sided hypotheses, control of the type I error rate requires that the correlation between the bivariate normal test statistics is non‐negative. In this article, we introduce a trimmed version of the one‐sided weighted Simes test for two hypotheses which rejects if (i) the one‐sided weighted Simes test rejects and (ii) both p‐values are below one minus the respective weighted Bonferroni adjusted level. We show that the trimmed version controls the type I error rate at nominal significance level α if (i) the common distribution of test statistics is point symmetric and (ii) the two‐sided weighted Simes test at level 2α controls the level. These assumptions apply, for instance, to bivariate normal test statistics with arbitrary correlation. In a simulation study, we compare the power of the trimmed weighted Simes test with the power of the weighted Bonferroni test and the untrimmed weighted Simes test. An additional result of this article ensures type I error rate control of the usual weighted Simes test under a weak version of the positive regression dependence condition for the case of two hypotheses. This condition is shown to apply to the two‐sided p‐values of one‐ or two‐sample t‐tests for bivariate normal endpoints with arbitrary correlation and to the corresponding one‐sided p‐values if the correlation is non‐negative. The Simes test for such types of bivariate t‐tests has not been considered before. According to our main result, the trimmed version of the weighted Simes test then also applies to the one‐sided bivariate t‐test with arbitrary correlation.  相似文献   

2.
The classical normal-theory tests for testing the null hypothesis of common variance and the classical estimates of scale have long been known to be quite nonrobust to even mild deviations from normality assumptions for moderate sample sizes. Levene (1960) suggested a one-way ANOVA type statistic as a robust test. Brown and Forsythe (1974) considered a modified version of Levene's test by replacing the sample means with sample medians as estimates of population locations, and their test is computationally the simplest among the three tests recommended by Conover , Johnson , and Johnson (1981) in terms of robustness and power. In this paper a new robust and powerful test for homogeneity of variances is proposed based on a modification of Levene's test using the weighted likelihood estimates (Markatou , Basu , and Lindsay , 1996) of the population means. For two and three populations the proposed test using the Hellinger distance based weighted likelihood estimates is observed to achieve better empirical level and power than Brown-Forsythe's test in symmetric distributions having a thicker tail than the normal, and higher empirical power in skew distributions under the use of F distribution critical values.  相似文献   

3.
After variable selection, standard inferential procedures for regression parameters may not be uniformly valid; there is no finite-sample size at which a standard test is guaranteed to approximately attain its nominal size. This problem is exacerbated in high-dimensional settings, where variable selection becomes unavoidable. This has prompted a flurry of activity in developing uniformly valid hypothesis tests for a low-dimensional regression parameter (eg, the causal effect of an exposure A on an outcome Y) in high-dimensional models. So far there has been limited focus on model misspecification, although this is inevitable in high-dimensional settings. We propose tests of the null that are uniformly valid under sparsity conditions weaker than those typically invoked in the literature, assuming working models for the exposure and outcome are both correctly specified. When one of the models is misspecified, by amending the procedure for estimating the nuisance parameters, our tests continue to be valid; hence, they are doubly robust. Our proposals are straightforward to implement using existing software for penalized maximum likelihood estimation and do not require sample splitting. We illustrate them in simulations and an analysis of data obtained from the Ghent University intensive care unit.  相似文献   

4.
Yuanjia Wang  Huaihou Chen 《Biometrics》2012,68(4):1113-1125
Summary We examine a generalized F ‐test of a nonparametric function through penalized splines and a linear mixed effects model representation. With a mixed effects model representation of penalized splines, we imbed the test of an unspecified function into a test of some fixed effects and a variance component in a linear mixed effects model with nuisance variance components under the null. The procedure can be used to test a nonparametric function or varying‐coefficient with clustered data, compare two spline functions, test the significance of an unspecified function in an additive model with multiple components, and test a row or a column effect in a two‐way analysis of variance model. Through a spectral decomposition of the residual sum of squares, we provide a fast algorithm for computing the null distribution of the test, which significantly improves the computational efficiency over bootstrap. The spectral representation reveals a connection between the likelihood ratio test (LRT) in a multiple variance components model and a single component model. We examine our methods through simulations, where we show that the power of the generalized F ‐test may be higher than the LRT, depending on the hypothesis of interest and the true model under the alternative. We apply these methods to compute the genome‐wide critical value and p ‐value of a genetic association test in a genome‐wide association study (GWAS), where the usual bootstrap is computationally intensive (up to 108 simulations) and asymptotic approximation may be unreliable and conservative.  相似文献   

5.
6.
Suppose it is desired to determine whether there is an association between any pair of p random variables. A common approach is to test H0 : R = I, where R is the usual population correlation matrix. A closely related approach is to test H0 : Rpb = I, where Rpb is the matrix of percentage bend correlations. In so far as type I errors are a concern, at a minimum any test of H0 should have a type I error probability close to the nominal level when all pairs of random variables are independent. Currently, the Gupta-Rathie method is relatively successful at controlling the probability of a type I error when testing H0: R = I, but as illustrated in this paper, it can fail when sampling from nonnormal distributions. The main goal in this paper is to describe a new test of H0: Rpb = I that continues to give reasonable control over the probability of a type I error in the situations where the Gupta-Rathie method fails. Even under normality, the new method has advantages when the sample size is small relative to p. Moreover, when there is dependence, but all correlations are equal to zero, the new method continues to give good control over the probability of a type I error while the Gupta-Rathie method does not. The paper also reports simulation results on a bootstrap confidence interval for the percentage bend correlation.  相似文献   

7.
Overdispersion is a common phenomenon in Poisson modeling, and the negative binomial (NB) model is frequently used to account for overdispersion. Testing approaches (Wald test, likelihood ratio test (LRT), and score test) for overdispersion in the Poisson regression versus the NB model are available. Because the generalized Poisson (GP) model is similar to the NB model, we consider the former as an alternate model for overdispersed count data. The score test has an advantage over the LRT and the Wald test in that the score test only requires that the parameter of interest be estimated under the null hypothesis. This paper proposes a score test for overdispersion based on the GP model and compares the power of the test with the LRT and Wald tests. A simulation study indicates the score test based on asymptotic standard Normal distribution is more appropriate in practical application for higher empirical power, however, it underestimates the nominal significance level, especially in small sample situations, and examples illustrate the results of comparing the candidate tests between the Poisson and GP models. A bootstrap test is also proposed to adjust the underestimation of nominal level in the score statistic when the sample size is small. The simulation study indicates the bootstrap test has significance level closer to nominal size and has uniformly greater power than the score test based on asymptotic standard Normal distribution. From a practical perspective, we suggest that, if the score test gives even a weak indication that the Poisson model is inappropriate, say at the 0.10 significance level, we advise the more accurate bootstrap procedure as a better test for comparing whether the GP model is more appropriate than Poisson model. Finally, the Vuong test is illustrated to choose between GP and NB2 models for the same dataset.  相似文献   

8.
Although standard statistical tests (such as contingency chi-square or G tests) are not well suited to the analysis of temporal changes in allele frequencies, they continue to be used routinely in this context. Because the null hypothesis stipulated by the test is violated if samples are temporally spaced, the true probability of a significant test statistic will not equal the nominal α level, and conclusions drawn on the basis of such tests can be misleading. A generalized method, applicable to a wide variety of organisms and sampling schemes, is developed here to estimate the probability of a significant test statistic if the only forces acting on allele frequencies are stochastic ones (i.e., sampling error and genetic drift). Results from analyses and simulations indicate that the rate at which this probability increases with time is determined primarily by the ratio of sample size to effective population size. Because this ratio differs considerably among species, the seriousness of the error in using the standard test will also differ. Bias is particularly strong in cases in which a high percentage of the total population can be sampled (for example, endangered species). The model used here is also applicable to the analysis of parent-offspring data and to comparisons of replicate samples from the same generation. A generalized test of the hypothesis that observed changes in allele frequency can be satisfactorily explained by drift follows directly from the model, and simulation results indicate that the true α level of this adjusted test is close to the nominal one under most conditions.  相似文献   

9.
This report explores how the heterogeneity of variances affects randomization tests used to evaluate differences in the asymptotic population growth rate, λ. The probability of Type I error was calculated in four scenarios for populations with identical λ but different variance of λ: (1) Populations have different projection matrices: the same λ may be obtained from different sets of vital rates, which gives room for different variances of λ. (2) Populations have identical projection matrices but reproductive schemes differ and fecundity in one of the populations has a larger associated variance. The two other scenarios evaluate a sampling artifact as responsible for heterogeneity of variances. The same population is sampled twice, (3) with the same sampling design, or (4) with different sampling effort for different stages. Randomization tests were done with increasing differences in sample size between the two populations. This implies additional differences in the variance of λ. The probability of Type I error keeps at the nominal significance level (α = .05) in Scenario 3 and with identical sample sizes in the others. Tests were too liberal, or conservative, under a combination of variance heterogeneity and different sample sizes. Increased differences in sample size exacerbated the difference between observed Type I error and the nominal significance level. Type I error increases or decreases depending on which population has a larger sample size, the population with the smallest or the largest variance. However, by their own, sample size is not responsible for changes in Type I errors.  相似文献   

10.
We consider the problem of testing for heterogeneity of K proportions when K is not small and the binomial sample sizes may not be large. We assume that the binomial proportions are normally distributed with variance σ2. The asymptotic relative efficiency (ARE) of the usual chi-square test is found relative to the likelihood-based tests for σ2=0. The chi-square test is found to have ARE = 1 when the binomial sample sizes are all equal and high relative efficiency for other cases. The efficiency is low only in cases where there is insufficient data to use the chi-square test.  相似文献   

11.
For the analysis of combinations of 2×2 non-contingency tables as obtained from density follow-up studies (relating a number of events to a number of person-years of follow-up) an analogue of the Mantel-Haenszel test for 2×2 contingency tables is widely used. In this paper the small sample properties of this test, both with and without continuity correction, are evaluated. Also the improvement of the test-statistic by using the first four cumulants via the Edgeworth expansion was studied. Results on continuity correction agree with similar studies on the Mantel-Haenszel statistic for 2×2 contingency tables: Continuity correction gives a p-value which approximates the exact p-value better than the p-value obtained without this correction; both the exact test and its approximations show considerable conservatism in small samples; the uncorrected Mantel-Haenszel test statistic gives a p-value that agrees more with the nominal significance level, but can be anti-conservative. The p-value based on the first four cumulants gives a better approximation of the exact p-value than the continuity corrected test, especially when the distribution has marked skewness.  相似文献   

12.
Monte‐Carlo simulation methods are commonly used for assessing the performance of statistical tests under finite sample scenarios. They help us ascertain the nominal level for tests with approximate level, e.g. asymptotic tests. Additionally, a simulation can assess the quality of a test on the alternative. The latter can be used to compare new tests and established tests under certain assumptions in order to determinate a preferable test given characteristics of the data. The key problem for such investigations is the choice of a goodness criterion. We expand the expected p‐value as considered by Sackrowitz and Samuel‐Cahn (1999) to the context of univariate equivalence tests. This presents an effective tool to evaluate new purposes for equivalence testing because of its independence of the distribution of the test statistic under null‐hypothesis. It helps to avoid the often tedious search for the distribution under null‐hypothesis for test statistics which have no considerable advantage over yet available methods. To demonstrate the usefulness in biometry a comparison of established equivalence tests with a nonparametric approach is conducted in a simulation study for three distributional assumptions.  相似文献   

13.
Heterosis is the phenomenon in which hybrid progeny exhibits superior traits in comparison with those of their parents. Genomic variations between the two parental genomes may generate epistasis interactions, which is one of the genetic hypotheses explaining heterosis. We postulate that protein?protein interactions specific to F1 hybrids (F1‐specific PPIs) may occur when two parental genomes combine, as the proteome of each parent may supply novel interacting partners. To test our assumption, an inter‐subspecies hybrid interactome was simulated by in silico PPI prediction between rice japonica (cultivar Nipponbare) and indica (cultivar 9311). Four‐thousand, six‐hundred and twelve F1‐specific PPIs accounting for 20.5% of total PPIs in the hybrid interactome were found. Genes participating in F1‐specific PPIs tend to encode metabolic enzymes and are generally localized in genomic regions harboring metabolic gene clusters. To test the genetic effect of F1‐specific PPIs in heterosis, genomic selection analysis was performed for trait prediction with additive, dominant and epistatic effects separately considered in the model. We found that the removal of single nucleotide polymorphisms associated with F1‐specific PPIs reduced prediction accuracy when epistatic effects were considered in the model, but no significant changes were observed when additive or dominant effects were considered. In summary, genomic divergence widely dispersed between japonica and indica rice may generate F1‐specific PPIs, part of which may accumulatively contribute to heterosis according to our computational analysis. These candidate F1‐specific PPIs, especially for those involved in metabolic biosynthesis pathways, are worthy of experimental validation when large‐scale protein interactome datasets are generated in hybrid rice in the future.  相似文献   

14.
Evolutionary diversification of a phenotypic trait reflects the tempo and mode of trait evolution, as well as the phylogenetic topology and branch lengths. Comparisons of trait variance between sister groups provide a powerful approach to test for differences in rates of diversification, controlling for differences in clade age. We used simulation analyses under constant rate Brownian motion to develop phylogenetically based F-tests of the ratio of trait variances between sister groups. Random phylogenies were used for a generalized evolutionary null model, so that detailed internal phylogenies are not required, and both gradual and speciational models of evolution were considered. In general, phylogenetically structured tests were more conservative than corresponding parametric statistics (i.e., larger variance ratios are required to achieve significance). The only exception was for comparisons under a speciational evolutionary model when the group with higher variance has very low sample size (number of species). The methods were applied to a large data set on seed size for 1976 species of California flowering plants. Seven of 37 sister-group comparisons were significant for the phylogenetically structured tests (compared to 12 of 37 for the parametric F-test). Groups with higher diversification of seed size generally had a greater diversity of fruit types, life form, or life history as well. The F-test for trait variances provides a simple, phylogenetically structured approach to test for differences in rates of phenotypic diversification and could also provide a valuable tool in the study of adaptive radiations.  相似文献   

15.
The discriminating capacity (i.e. ability to correctly classify presences and absences) of species distribution models (SDMs) is commonly evaluated with metrics such as the area under the receiving operating characteristic curve (AUC), the Kappa statistic and the true skill statistic (TSS). AUC and Kappa have been repeatedly criticized, but TSS has fared relatively well since its introduction, mainly because it has been considered as independent of prevalence. In addition, discrimination metrics have been contested because they should be calculated on presence–absence data, but are often used on presence‐only or presence‐background data. Here, we investigate TSS and an alternative set of metrics—similarity indices, also known as F‐measures. We first show that even in ideal conditions (i.e. perfectly random presence–absence sampling), TSS can be misleading because of its dependence on prevalence, whereas similarity/F‐measures provide adequate estimations of model discrimination capacity. Second, we show that in real‐world situations where sample prevalence is different from true species prevalence (i.e. biased sampling or presence‐pseudoabsence), no discrimination capacity metric provides adequate estimation of model discrimination capacity, including metrics specifically designed for modelling with presence‐pseudoabsence data. Our conclusions are twofold. First, they unequivocally impel SDM users to understand the potential shortcomings of discrimination metrics when quality presence–absence data are lacking, and we recommend obtaining such data. Second, in the specific case of virtual species, which are increasingly used to develop and test SDM methodologies, we strongly recommend the use of similarity/F‐measures, which were not biased by prevalence, contrary to TSS.  相似文献   

16.
Jung BC  Jhun M  Lee JW 《Biometrics》2005,61(2):626-628
Ridout, Hinde, and Demétrio (2001, Biometrics 57, 219-223) derived a score test for testing a zero-inflated Poisson (ZIP) regression model against zero-inflated negative binomial (ZINB) alternatives. They mentioned that the score test using the normal approximation might underestimate the nominal significance level possibly for small sample cases. To remedy this problem, a parametric bootstrap method is proposed. It is shown that the bootstrap method keeps the significance level close to the nominal one and has greater power uniformly than the existing normal approximation for testing the hypothesis.  相似文献   

17.
The 3 way nested ANOVA model yijk = μ + ai + Bij + γk + (αγ)ik + epsilonijk With α (treatment or group effects) and γ (time) both being fixed effects and B (the individual effects) random and nested within α, is introduced and explored. The problems associated with the usual approach are explained. The alternative model Is developed and a method of evaluation via the method of linear contrasts is recomended. The test statistic has the distribution of a convolution of F-distributions. Further, a method of investigating the assumptions of the model is offered and a further generalization using path spaces (of dim. K) is developed. Here again the appropriate test statistic has the distribution of a convolution of F-distributions. This combined with the method of linear contrasts offers an elegant solution to the BEHRENS-FISHER problem. yijk = fik + Bij + epsilonijk  相似文献   

18.
The paper considers methods for testing H0: β1 = … = βp = 0, where β1, … ,βp are the slope parameters in a linear regression model with an emphasis on p = 2. It is known that even when the usual error term is normal, but heteroscedastic, control over the probability of a type I error can be poor when using the conventional F test in conjunction with the least squares estimator. When the error term is nonnormal, the situation gets worse. Another practical problem is that power can be poor under even slight departures from normality. Liu and Singh (1997) describe a general bootstrap method for making inferences about parameters in a multivariate setting that is based on the general notion of depth. This paper studies the small-sample properties of their method when applied to the problem at hand. It is found that there is a practical advantage to using Tukey's depth versus the Mahalanobis depth when using a particular robust estimator. When using the ordinary least squares estimator, the method improves upon the conventional F test, but practical problems remain when the sample size is less than 60. In simulations, using Tukey's depth with the robust estimator gave the best results, in terms of type I errors, among the five methods studied.  相似文献   

19.
In this paper the first two moments of the test criterion (Treat. S.S.)/(Treat.S.S. + Error S.S.) have been derived assuming that one observation corresponding to the first plot of the first block which is under treatment ‘m’ (say), is missing in a randomized block design with ‘V’ treatments and ‘r’ blocks. To keep the analysis simple the case of one missing observation has been considered. It is concluded that in general the design is not unbiased in YATES (1951) sense and the usual F-test is satisfactory iff /(vr - r - 1)=(S - S1)/(v - 1) (r-1), the block errors are homogeneous and (vr - r - 1) is large. The analysis of three unifornity trial data indicates that the first condition is the most important for the F-test to be satisfactory. However, if one observation is missing at random from some plot of some block, the F-test is unbiased. If the block errors are homogeneous and (vr - r - 1) GT96, the F-test also provides a good approximation to the corresponding randomization test in this case.  相似文献   

20.
Growth traits, such as body weight and carcass body length, directly affect productivity and economic efficiency in the livestock industry. We performed a genome‐wide linkage analysis to detect the quantitative trait loci (QTL) that affect body weight, growth curve parameters and carcass body length in an F2 intercross between Landrace and Korean native pigs. Eight phenotypes related to growth were measured in approximately 1000 F2 progeny. All experimental animals were subjected to genotypic analysis using 173 microsatellite markers located throughout the pig genome. The least squares regression approach was used to conduct the QTL analysis. For body weight traits, we mapped 16 genome‐wide significant QTL on SSC1, 3, 5, 6, 8, 9 and 12 as well as 22 suggestive QTL on SSC2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 16 and 17. On SSC12, we identified a major QTL affecting body weight at 140 days of age that accounted for 4.3% of the phenotypic variance, which was the highest test statistic (F‐ratio = 45.6 under the additive model, nominal = 2.4 × 10?11) observed in this study. We also showed that there were significant QTL on SSC2, 5, 7, 8, 9 and 12 affecting carcass body length and growth curve parameters. Interestingly, the QTL on SSC2, 3, 5, 6, 8, 9, 10, 12 and 17 influencing the growth‐related traits showed an obvious trend for co‐localization. In conclusion, the identified QTL may play an important role in investigating the genetic structure underlying the phenotypic variation of growth in pigs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号