首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Ryman N  Jorde PE 《Molecular ecology》2001,10(10):2361-2373
A variety of statistical procedures are commonly employed when testing for genetic differentiation. In a typical situation two or more samples of individuals have been genotyped at several gene loci by molecular or biochemical means, and in a first step a statistical test for allele frequency homogeneity is performed at each locus separately, using, e.g. the contingency chi-square test, Fisher's exact test, or some modification thereof. In a second step the results from the separate tests are combined for evaluation of the joint null hypothesis that there is no allele frequency difference at any locus, corresponding to the important case where the samples would be regarded as drawn from the same statistical and, hence, biological population. Presently, there are two conceptually different strategies in use for testing the joint null hypothesis of no difference at any locus. One approach is based on the summation of chi-square statistics over loci. Another method is employed by investigators applying the Bonferroni technique (adjusting the P-value required for rejection to account for the elevated alpha errors when performing multiple tests simultaneously) to test if the heterogeneity observed at any particular locus can be regarded significant when considered separately. Under this approach the joint null hypothesis is rejected if one or more of the component single locus tests is considered significant under the Bonferroni criterion. We used computer simulations to evaluate the statistical power and realized alpha errors of these strategies when evaluating the joint hypothesis after scoring multiple loci. We find that the 'extended' Bonferroni approach generally is associated with low statistical power and should not be applied in the current setting. Further, and contrary to what might be expected, we find that 'exact' tests typically behave poorly when combined in existing procedures for joint hypothesis testing. Thus, while exact tests are generally to be preferred over approximate ones when testing each particular locus, approximate tests such as the traditional chi-square seem preferable when addressing the joint hypothesis.  相似文献   

2.
Zhang K  Traskin M  Small DS 《Biometrics》2012,68(1):75-84
For group-randomized trials, randomization inference based on rank statistics provides robust, exact inference against nonnormal distributions. However, in a matched-pair design, the currently available rank-based statistics lose significant power compared to normal linear mixed model (LMM) test statistics when the LMM is true. In this article, we investigate and develop an optimal test statistic over all statistics in the form of the weighted sum of signed Mann-Whitney-Wilcoxon statistics under certain assumptions. This test is almost as powerful as the LMM even when the LMM is true, but it is much more powerful for heavy tailed distributions. A simulation study is conducted to examine the power.  相似文献   

3.
Exact inference for growth curves with intraclass correlation structure   总被引:2,自引:0,他引:2  
Weerahandi S  Berger VW 《Biometrics》1999,55(3):921-924
We consider repeated observations taken over time for each of several subjects. For example, one might consider the growth curve of a cohort of babies over time. We assume a simple linear growth curve model. Exact results based on sufficient statistics (exact tests of the null hypothesis that a coefficient is zero, or exact confidence intervals for coefficients) are not available to make inference on regression coefficients when an intraclass correlation structure is assumed. This paper will demonstrate that such exact inference is possible using generalized inference.  相似文献   

4.
Summary .  For testing for treatment effects with time-to-event data, the logrank test is the most popular choice and has some optimality properties under proportional hazards alternatives. It may also be combined with other tests when a range of nonproportional alternatives are entertained. We introduce some versatile tests that use adaptively weighted logrank statistics. The adaptive weights utilize the hazard ratio obtained by fitting the model of Yang and Prentice (2005,  Biometrika   92 , 1–17). Extensive numerical studies have been performed under proportional and nonproportional alternatives, with a wide range of hazard ratios patterns. These studies show that these new tests typically improve the tests they are designed to modify. In particular, the adaptively weighted logrank test maintains optimality at the proportional alternatives, while improving the power over a wide range of nonproportional alternatives. The new tests are illustrated in several real data examples.  相似文献   

5.
MOTIVATION: A primary objective of microarray studies is to determine genes which are differentially expressed under various conditions. Parametric tests, such as two-sample t-tests, may be used to identify differentially expressed genes, but they require some assumptions that are not realistic for many practical problems. Non-parametric tests, such as empirical Bayes methods and mixture normal approaches, have been proposed, but the inferences are complicated and the tests may not have as much power as parametric models. RESULTS: We propose a weakly parametric method to model the distributions of summary statistics that are used to detect differentially expressed genes. Standard maximum likelihood methods can be employed to make inferences. For illustration purposes the proposed method is applied to the leukemia data (training part) discussed elsewhere. A simulation study is conducted to evaluate the performance of the proposed method.  相似文献   

6.
Ferretti L  Raineri E  Ramos-Onsins S 《Genetics》2012,191(4):1397-1401
Missing data are common in DNA sequences obtained through high-throughput sequencing. Furthermore, samples of low quality or problems in the experimental protocol often cause a loss of data even with traditional sequencing technologies. Here we propose modified estimators of variability and neutrality tests that can be naturally applied to sequences with missing data, without the need to remove bases or individuals from the analysis. Modified statistics include the Watterson estimator θ(W), Tajima's D, Fay and Wu's H, and HKA. We develop a general framework to take missing data into account in frequency spectrum-based neutrality tests and we derive the exact expression for the variance of these statistics under the neutral model. The neutrality tests proposed here can also be used as summary statistics to describe the information contained in other classes of data like DNA microarrays.  相似文献   

7.
Inverse sampling is considered to be a more appropriate sampling scheme than the usual binomial sampling scheme when subjects arrive sequentially, when the underlying response of interest is acute, and when maximum likelihood estimators of some epidemiologic indices are undefined. In this article, we study various statistics for testing non-unity rate ratios in case-control studies under inverse sampling. These include the Wald, unconditional score, likelihood ratio and conditional score statistics. Three methods (the asymptotic, conditional exact, and Mid-P methods) are adopted for P-value calculation. We evaluate the performance of different combinations of test statistics and P-value calculation methods in terms of their empirical sizes and powers via Monte Carlo simulation. In general, asymptotic score and conditional score tests are preferable for their actual type I error rates are well controlled around the pre-chosen nominal level, and their powers are comparatively the largest. The exact version of Wald test is recommended if one wants to control the actual type I error rate at or below the pre-chosen nominal level. If larger power is expected and fluctuation of sizes around the pre-chosen nominal level are allowed, then the Mid-P version of Wald test is a desirable alternative. We illustrate the methodologies with a real example from a heart disease study.  相似文献   

8.
Rohlfs RV  Weir BS 《Genetics》2008,180(3):1609-1616
It is well established that test statistics and P-values derived from discrete data, such as genetic markers, are also discrete. In most genetic applications, the null distribution for a discrete test statistic is approximated with a continuous distribution, but this approximation may not be reasonable. In some cases using the continuous approximation for the expected null distribution may cause truly null test statistics to appear nonnull. We explore the implications of using continuous distributions to approximate the discrete distributions of Hardy–Weinberg equilibrium test statistics and P-values. We derive exact P-value distributions under the null and alternative hypotheses, enabling a more accurate analysis than is possible with continuous approximations. We apply these methods to biological data and find that using continuous distribution theory with exact tests may underestimate the extent of Hardy–Weinberg disequilibrium in a sample. The implications may be most important for the widespread use of whole-genome case–control association studies and Hardy–Weinberg equilibrium (HWE) testing for data quality control.  相似文献   

9.
K F Hirji 《Biometrics》1991,47(2):487-496
A recently developed algorithm for generating the distribution of sufficient statistics for conditional logistic models can be put to a twofold use. First, it provides an avenue for performing inference for matched case-control studies that does not rely on the assumption of a large sample size. Second, joint distributions generated by this algorithm can be used to make comparisons of various inferential procedures that are free from Monte Carlo sampling errors. In this paper, these two features of the algorithm are utilized to compare small-sample properties of the exact, mid-P value, and score tests for a conditional logistic model with two unmatched binary covariates. Both uniparametric and multiparametric tests, performed at a nominal significance level of .05, were studied. It was found that the actual significance levels of the mid-P test tend to be closer to the nominal level when compared with those of the other two tests.  相似文献   

10.
The probability of tumor cure in a homogeneous population of tumors exposed to fractionated radiotherapy was modeled using numerical simulations and compared with the predictions of Poisson statistics, assuming exact knowledge of the relevant tumor parameters (clonogen number, radiosensitivity, and growth kinetics). The results show that although Poisson statistics (based on exact knowledge of all parameters) accurately describes the probability of tumor cure when no proliferation occurs during treatment, it underestimates the cure rate when proliferation does occur. In practice, however, the inaccuracy is not likely to be more than about 10%. When the tumor parameters are unknown and are estimated by fitting an empirical Poisson model to tumor-cure data from a homogeneous population of proliferative tumors, the resulting estimates of tumor growth rate and radiosensitivity accurately reflect the true values, but the estimate of initial clonogen number is biased downward. A new formula that is more accurate than Poisson statistics in predicting the probability of tumor cure when proliferation occurs during treatment is discussed.  相似文献   

11.
Evaluating the likelihood function of parameters in highly-structured population genetic models from extant deoxyribonucleic acid (DNA) sequences is computationally prohibitive. In such cases, one may approximately infer the parameters from summary statistics of the data such as the site-frequency-spectrum (SFS) or its linear combinations. Such methods are known as approximate likelihood or Bayesian computations. Using a controlled lumped Markov chain and computational commutative algebraic methods, we compute the exact likelihood of the SFS and many classical linear combinations of it at a non-recombining locus that is neutrally evolving under the infinitely-many-sites mutation model. Using a partially ordered graph of coalescent experiments around the SFS, we provide a decision-theoretic framework for approximate sufficiency. We also extend a family of classical hypothesis tests of standard neutrality at a non-recombining locus based on the SFS to a more powerful version that conditions on the topological information provided by the SFS.  相似文献   

12.
Summary Methods for performing multiple tests of paired proportions are described. A broadly applicable method using McNemar's exact test and the exact distributions of all test statistics is developed; the method controls the familywise error rate in the strong sense under minimal assumptions. A closed form (not simulation‐based) algorithm for carrying out the method is provided. A bootstrap alternative is developed to account for correlation structures. Operating characteristics of these and other methods are evaluated via a simulation study. Applications to multiple comparisons of predictive models for disease classification and to postmarket surveillance of adverse events are given.  相似文献   

13.
Two statistics are proposed for testing the hypothesis of equality of the means of a bivariate normal distribution with unknown common variance and correlation coefficient when observations are missing on both variates. One of the statistics reduces to the one proposed by Bhoj (1978, 1984) when the unpaired observations on the variates are equal. The distributions of the statistics are approximated by well known distributions under the null hypothesis. The empirical powers of the tests are computed and compared with those of some known statistics. The comparison supports the use of one of the statistics proposed in this paper.  相似文献   

14.
Computer fitting of binding data is discussed and it is concluded that the main problem is the choice of starting estimates and internal scaling parameters, not the optimization software. Solving linear overdetermined systems of equations for starting estimates is investigated. A function, Q, is introduced to study model discrimination with binding isotherms and the behaviour of Q as a function of model parameters is calculated for the case of 2 and 3 sites. The power function of the F test is estimated for models with 2 to 5 binding sites and necessary constraints on parameters for correct model discrimination are given. The sampling distribution of F test statistics is compared to an exact F distribution using the Chi-squared and Kolmogorov-Smirnov tests. For low order modes (n less than 3) the F test statistics are approximately F distributed but for higher order models the test statistics are skewed to the left of the F distribution. The parameter covariance matrix obtained by inverting the Hessian matrix of the objective function is shown to be a good approximation to the estimate obtained by Monte Carlo sampling for low order models (n less than 3). It is concluded that analysis of up to 2 or 3 binding sites presents few problems and linear, normal statistical results are valid. To identify correctly 4 sites is much more difficult, requiring very precise data and extreme parameter values. Discrimination of 5 from 4 sites is an upper limit to the usefulness of the F test.  相似文献   

15.
Exact tests for one sample correlated binary data   总被引:1,自引:0,他引:1  
In this paper we developed exact tests for one sample correlated binary data whose cluster sizes are at most two. Although significant progress has been made in the development and implementation of the exact tests for uncorrelated data, exact tests for correlated data are rare. Lack of a tractable likelihood function has made it difficult to develop exact tests for correlated binary data. However, when cluster sizes of binary data are at most two, only three parameters are needed to characterize the problem. One parameter is fixed under the null hypothesis, while the other two parameters can be removed by both conditional and unconditional approaches, respectively, to construct exact tests. We compared the exact and asymptotic p-values in several cases. The proposed method is applied to real-life data.  相似文献   

16.
The mixed-model factorial analysis of variance has been used in many recent studies in evolutionary quantitative genetics. Two competing formulations of the mixed-model ANOVA are commonly used, the “Scheffe” model and the “SAS” model; these models differ in both their assumptions and in the way in which variance components due to the main effect of random factors are defined. The biological meanings of the two variance component definitions have often been unappreciated, however. A full understanding of these meanings leads to the conclusion that the mixed-model ANOVA could have been used to much greater effect by many recent authors. The variance component due to the random main effect under the two-way SAS model is the covariance in true means associated with a level of the random factor (e.g., families) across levels of the fixed factor (e.g., environments). Therefore the SAS model has a natural application for estimating the genetic correlation between a character expressed in different environments and testing whether it differs from zero. The variance component due to the random main effect under the two-way Scheffe model is the variance in marginal means (i.e., means over levels of the fixed factor) among levels of the random factor. Therefore the Scheffe model has a natural application for estimating genetic variances and heritabilities in populations using a defined mixture of environments. Procedures and assumptions necessary for these applications of the models are discussed. While exact significance tests under the SAS model require balanced data and the assumptions that family effects are normally distributed with equal variances in the different environments, the model can be useful even when these conditions are not met (e.g., for providing an unbiased estimate of the across-environment genetic covariance). Contrary to statements in a recent paper, exact significance tests regarding the variance in marginal means as well as unbiased estimates can be readily obtained from unbalanced designs with no restrictive assumptions about the distributions or variance-covariance structure of family effects.  相似文献   

17.
We consider sample size determination for ordered categorical data when the alternative assumption is the proportional odds model. In this paper the sample size formula proposed by Whitehead (Statistics in Medicine, 12 , 2257–2271, 1993) is compared with the methods based on exact and asymptotic linear rank tests with Wilcoxon and trend scores. We show that Whitehead's formula, which is based on a normal approximation, works well when the sample size is moderate to large but recommend the exact method with Wilcoxon scores for small sample sizes. The consequences of misspecification in models are also investigated.  相似文献   

18.
McNemar test is commonly used to test for the marginal homogeneity in 2 × 2 contingency tables. McNemar test is an asymptotic test based either on standard normal distribution or on the chi‐square distribution. When the total sample size is small, an exact version of McNemar test is available based on the binomial probabilities. The example in the paper came from a clinical study to investigate the effect of epidermal growth factor for children who had microvillus inclusion diseases. There were only six observations available. The test results differ between the exact test and the asymptotic test. It is a common belief that with this small sample size the exact test be used. However, we claim that McNemar test performs better than the exact test even when the sample size is small. In order to investigate the performances of McNemar test and the exact test, we identify the parameters that affect the test results and then perform sensitivity analysis. In addition, through Monte Carlo simulation studies we compare the empirical sizes and powers of these tests as well as other asymptotic tests such as Wald test and the likelihood ratio test.  相似文献   

19.
Recent studies have revealed a relationship between protein abundance and sampling statistics, such as sequence coverage, peptide count, and spectral count, in label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics. The use of sampling statistics offers a promising method of measuring relative protein abundance and detecting differentially expressed or coexpressed proteins. We performed a systematic analysis of various approaches to quantifying differential protein expression in eukaryotic Saccharomyces cerevisiae and prokaryotic Rhodopseudomonas palustris label-free LC-MS/MS data. First, we showed that, among three sampling statistics, the spectral count has the highest technical reproducibility, followed by the less-reproducible peptide count and relatively nonreproducible sequence coverage. Second, we used spectral count statistics to measure differential protein expression in pairwise experiments using five statistical tests: Fisher's exact test, G-test, AC test, t-test, and LPE test. Given the S. cerevisiae data set with spiked proteins as a benchmark and the false positive rate as a metric, our evaluation suggested that the Fisher's exact test, G-test, and AC test can be used when the number of replications is limited (one or two), whereas the t-test is useful with three or more replicates available. Third, we generalized the G-test to increase the sensitivity of detecting differential protein expression under multiple experimental conditions. Out of 1622 identified R. palustris proteins in the LC-MS/MS experiment, the generalized G-test detected 1119 differentially expressed proteins under six growth conditions. Finally, we studied correlated expression of these 1119 proteins by analyzing pairwise expression correlations and by delineating protein clusters according to expression patterns. Through pairwise expression correlation analysis, we demonstrated that proteins co-located in the same operon were much more strongly coexpressed than those from different operons. Combining cluster analysis with existing protein functional annotations, we identified six protein clusters with known biological significance. In summary, the proposed generalized G-test using spectral count sampling statistics is a viable methodology for robust quantification of relative protein abundance and for sensitive detection of biologically significant differential protein expression under multiple experimental conditions in label-free shotgun proteomics.  相似文献   

20.
Hirotsu C  Aoki S  Inada T  Kitao Y 《Biometrics》2001,57(3):769-778
The association analysis between the disease and genetic alleles is one of the simple methods for localizing the susceptibility locus in the genes. For revealing the association, several statistical tests have been proposed without discussing explicitly the alternative hypotheses. We therefore specify two types of alternative hypotheses (i.e., there is only one susceptibility allele in the locus, and there is an extension or shortening of alleles associated with the disease) and derive exact tests for the respective hypotheses. We also propose to combine these two tests when the prior knowledge is not sufficient enough to specify one of these two hypotheses. In particular, these ideas are extended to the haplotype analysis of three-way association between the disease and bivariate allele frequencies at two closely linked loci. As a by-product, a factorization of the probability distribution of the three-way cell frequencies under the null hypothesis of no three-way interaction is obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号