首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
MOTIVATION: The desire to compare molecular phylogenies has stimulated the design of numerous tests. Most of these tests are formulated in a frequentist framework, and it is not known how they compare with Bayes procedures. I propose here two new Bayes tests that either compare pairs of trees (Bayes hypothesis test, BHT), or test each tree against an average of the trees included in the analysis (Bayes significance test, BST). RESULTS: The algorithm, based on a standard Metropolis-Hastings sampler, integrates nuisance parameters out and estimates the probability of the data under each topology. These quantities are used to estimate Bayes factors for composite vs. composite hypotheses. Based on two data sets, the BHT and BST are shown to construct similar confidence sets to the bootstrap and the Shimodaira Hasegawa test, respectively. This suggests that the known difference among previous tests is mainly due to the null hypothesis considered.  相似文献   

2.
Probabilistic tests of topology offer a powerful means of evaluating competing phylogenetic hypotheses. The performance of the nonparametric Shimodaira-Hasegawa (SH) test, the parametric Swofford-Olsen-Waddell-Hillis (SOWH) test, and Bayesian posterior probabilities were explored for five data sets for which all the phylogenetic relationships are known with a very high degree of certainty. These results are consistent with previous simulation studies that have indicated a tendency for the SOWH test to be prone to generating Type 1 errors because of model misspecification coupled with branch length heterogeneity. These results also suggest that the SOWH test may accord overconfidence in the true topology when the null hypothesis is in fact correct. In contrast, the SH test was observed to be much more conservative, even under high substitution rates and branch length heterogeneity. For some of those data sets where the SOWH test proved misleading, the Bayesian posterior probabilities were also misleading. The results of all tests were strongly influenced by the exact substitution model assumptions. Simple models, especially those that assume rate homogeneity among sites, had a higher Type 1 error rate and were more likely to generate misleading posterior probabilities. For some of these data sets, the commonly used substitution models appear to be inadequate for estimating appropriate levels of uncertainty with the SOWH test and Bayesian methods. Reasons for the differences in statistical power between the two maximum likelihood tests are discussed and are contrasted with the Bayesian approach.  相似文献   

3.
We present a new procedure for assessing the statistical significance of the most likely unrooted dichotomous topology inferrable from four DNA sequences. The procedure calculates directly a P-value for the support given to this topology by the informative sites congruent with it, assuming the most likely star topology as the null hypothesis. Informative sites are crucial in the determination of the maximum likelihood dichotomous topology and are therefore an obvious target for a statistical test of phylogenies. Our P-value is the probability of producing through parallel substitutions on the branches of the star topology at least as much support as that given to the maximum likelihood dichotomous topology by the aforementioned informative sites, for any of the three possible dichotomous topologies. The degree of statistical significance is simply the complement of this P-value. Ours is therefore an a posteriori testing approach, in which no dichotomous topology is specified in advance. We implement the test for the case in which all sites behave identically and the substitution model has a single parameter. Under these conditions, the P-value can be easily calculated on the basis of the probabilities of change on the branches of the most likely star topology, because under these assumptions, each site can become informative independently from every other site; accordingly, the total number of informative sites of each kind is binomially distributed. We explore the test's type I error by applying it to data produced in star topologies having all branches equally long, or having two short and two long branches, and various degrees of homoplasy. The test is conservative but we demonstrate, by means of a discreteness correction and progressively assumption-free calculations of the P-values, that (1) the conservativeness is mostly due to the discrete nature of informative sites and (2) the P-values calculated empirically are moreover mostly quite accurate in absolute terms. Applying the test to data produced in dichotomous topologies with increasing internal branch length shows that, despite the test's "conservativeness," its power is much higher than that of the bootstrap, especially when the relevant informative sites are few.  相似文献   

4.
Phylogenetic inference and evaluating support for inferred relationships is at the core of many studies testing evolutionary hypotheses. Despite the popularity of nonparametric bootstrap frequencies and Bayesian posterior probabilities, the interpretation of these measures of tree branch support remains a source of discussion. Furthermore, both methods are computationally expensive and become prohibitive for large data sets. Recent fast approximate likelihood-based measures of branch supports (approximate likelihood ratio test [aLRT] and Shimodaira-Hasegawa [SH]-aLRT) provide a compelling alternative to these slower conventional methods, offering not only speed advantages but also excellent levels of accuracy and power. Here we propose an additional method: a Bayesian-like transformation of aLRT (aBayes). Considering both probabilistic and frequentist frameworks, we compare the performance of the three fast likelihood-based methods with the standard bootstrap (SBS), the Bayesian approach, and the recently introduced rapid bootstrap. Our simulations and real data analyses show that with moderate model violations, all tests are sufficiently accurate, but aLRT and aBayes offer the highest statistical power and are very fast. With severe model violations aLRT, aBayes and Bayesian posteriors can produce elevated false-positive rates. With data sets for which such violation can be detected, we recommend using SH-aLRT, the nonparametric version of aLRT based on a procedure similar to the Shimodaira-Hasegawa tree selection. In general, the SBS seems to be excessively conservative and is much slower than our approximate likelihood-based methods.  相似文献   

5.
In this paper, several different procedures for constructing confidence regions for the true evolutionary tree are evaluated both in terms of coverage and size without considering model misspecification. The regions are constructed on the basis of tests of hypothesis using six existing tests: Shimodaira Hasegawa (SH), SOWH, star form of SOWH (SSOWH), approximately unbiased (AU), likelihood weight (LW), generalized least squares, plus two new tests proposed in this paper: single distribution nonparametric bootstrap (SDNB) and single distribution parametric bootstrap (SDPB). The procedures are evaluated on simulated trees both with small and large number of taxa. Overall, the SH, SSOWH, AU, and LW tests led to regions with higher coverage than the nominal level at the price of including large numbers of trees. Under the specified model, the SOWH test gives accurate coverage and relatively small regions. The SDNB and SDPB tests led to the small regions with occasional undercoverage. These two procedures have a substantial computational advantage over the SOWH test. Finally, the cutoff levels for the SDNB test are shown to be more variable than those for the SDPB test.  相似文献   

6.
The accessory gene regulator (agr) locus influences the expression of many virulence genes in the human pathogen Staphylococcus aureus. Four allelic groups of agr, which generally inhibit the regulatory activity of each other, have been identified within the species. Interference in virulence gene expression caused by different agr groups has been suggested to be a mechanism for isolating bacterial populations and a fundamental basis for subdividing the species. To test the hypothesis that the species is phylogenetically structured according to agr groups, we mapped agr groups onto a clone phylogeny inferred from partial sequences of 14 genes from 27 genetically diverse strains. Shimodaira-Hasegawa and parametric bootstrap tests rejected the hypotheses that the species is subdivided into three or five monophyletic agr groups but failed to reject the hypothesis that the species is subdivided into two groups that each consist of multiple clonal complexes and multiple agr groups. Additional evidence for agr recombination is found from clustered polymorphisms in complete agr sequences. However, agr recombination has not occurred frequently or randomly through time, because the topology and branch lengths of the clone phylogeny are reflected within each agr group. To account for these observations, we propose a new evolutionary model that involves a genetically polymorphic ancestral population of S. aureus that horizontally transferred agr groups between two subspecies groups near the time that these subspecies groups diverged.  相似文献   

7.
Bootstrap method of interior-branch test for phylogenetic trees   总被引:7,自引:2,他引:5  
Statistical properties of the bootstrap test of interior branch lengths of phylogenetic trees have been studied and compared with those of the standard interior-branch test in computer simulations. Examination of the properties of the tests under the null hypothesis showed that both tests for an interior branch of a predetermined topology are quite reliable when the distribution of the branch length estimate approaches a normal distribution. Unlike the standard interior-branch test, the bootstrap test appears to retain this property even when the substitution rate varies among sites. In this case, the distribution of the branch length estimate deviates from a normal distribution, and the standard interior-branch test gives conservative confidence probability values. A simple correction method was developed for both interior- branch tests to be applied for testing the reliability of tree topologies estimated from sequence data. This correction for the standard interior-branch test appears to be as effective as that obtained in our previous study, though it is much simpler. The bootstrap and standard interior-branch tests for estimated topologies become conservative as the number of sequence groups in a star-like tree increases.   相似文献   

8.
In this paper the detection of rare variants association with continuous phenotypes of interest is investigated via the likelihood-ratio based variance component test under the framework of linear mixed models. The hypothesis testing is challenging and nonstandard, since under the null the variance component is located on the boundary of its parameter space. In this situation the usual asymptotic chisquare distribution of the likelihood ratio statistic does not necessarily hold. To circumvent the derivation of the null distribution we resort to the bootstrap method due to its generic applicability and being easy to implement. Both parametric and nonparametric bootstrap likelihood ratio tests are studied. Numerical studies are implemented to evaluate the performance of the proposed bootstrap likelihood ratio test and compare to some existing methods for the identification of rare variants. To reduce the computational time of the bootstrap likelihood ratio test we propose an effective approximation mixture for the bootstrap null distribution. The GAW17 data is used to illustrate the proposed test.  相似文献   

9.
The assumption of Hardy-Weinberg equilibrium (HWE) is generally required for association analysis using case-control design on autosomes; otherwise, the size may be inflated. There has been an increasing interest of exploring the association between diseases and markers on X chromosome and the effect of the departure from HWE on association analysis on X chromosome. Note that there are two hypotheses of interest regarding the X chromosome: (i) the frequencies of the same allele at a locus in males and females are equal and (ii) the inbreeding coefficient in females is zero (without excess homozygosity). Thus, excess homozygosity and significantly different minor allele frequencies between males and females are used to filter X-linked variants. There are two existing methods to test for (i) and (ii), respectively. However, their size and powers have not been studied yet. Further, there is no existing method to simultaneously detect both hypotheses till now. Therefore, in this article, we propose a novel likelihood ratio test for both (i) and (ii) on X chromosome. To further investigate the underlying reason why the null hypothesis is statistically rejected, we also develop two likelihood ratio tests for detecting (i) and (ii), respectively. Moreover, we explore the effect of population stratification on the proposed tests. From our simulation study, the size of the test for (i) is close to the nominal significance level. However, the size of the excess homozygosity test and the test for both (i) and (ii) is conservative. So, we propose parametric bootstrap techniques to evaluate their validity and performance. Simulation results show that the proposed methods with bootstrap techniques control the size well under the respective null hypothesis. Power comparison demonstrates that the methods with bootstrap techniques are more powerful than those without bootstrap procedure and the existing methods. The application of the proposed methods to a rheumatoid arthritis dataset indicates their utility.  相似文献   

10.
An approximately unbiased (AU) test that uses a newly devised multiscale bootstrap technique was developed for general hypothesis testing of regions in an attempt to reduce test bias. It was applied to maximum-likelihood tree selection for obtaining the confidence set of trees. The AU test is based on the theory of Efron et al. (Proc. Natl. Acad. Sci. USA 93:13429-13434; 1996), but the new method provides higher-order accuracy yet simpler implementation. The AU test, like the Shimodaira-Hasegawa (SH) test, adjusts the selection bias overlooked in the standard use of the bootstrap probability and Kishino-Hasegawa tests. The selection bias comes from comparing many trees at the same time and often leads to overconfidence in the wrong trees. The SH test, though safe to use, may exhibit another type of bias such that it appears conservative. Here I show that the AU test is less biased than other methods in typical cases of tree selection. These points are illustrated in a simulation study as well as in the analysis of mammalian mitochondrial protein sequences. The theoretical argument provides a simple formula that covers the bootstrap probability test, the Kishino-Hasegawa test, the AU test, and the Zharkikh-Li test. A practical suggestion is provided as to which test should be used under particular circumstances.  相似文献   

11.
A confidence region for topologies is a data-dependent set of topologies that, with high probability, can be expected to contain the true topology. Because of the connection between confidence regions and hypothesis tests, implicitly or explicitly, the construction of confidence regions for topologies is a component of many phylogenetic studies. Existing methods for constructing confidence regions, however, often give conflicting results. The Shimodaira-Hasegawa test seems too conservative, including too many topologies, whereas the other commonly used method, the Swofford-Olsen-Waddell-Hillis test, tends to give confidence regions with too few topologies. Confidence regions are constructed here based on a generalized least squares test statistic. The methodology described is computationally inexpensive and broadly applicable to maximum likelihood distances. Assuming the model used to construct the distances is correct, the coverage probabilities are correct with large numbers of sites.  相似文献   

12.
Many studies aim to assess whether a therapy has a beneficial effect on multiple outcomes simultaneously relative to a control. Often the joint null hypothesis of no difference for the set of outcomes is tested using separate tests with a correction for multiple tests, or using a multivariate T 2-like MANOVA or global test. However, a more powerful test in this case is a multivariate one-sided or one-directional test directed at detecting a simultaneous beneficial treatment effect on each outcome, though not necessarily of the same magnitude. The Wei-Lachin test is a simple 1 df test obtained from a simple sum of the component statistics that was originally described in the context of a multivariate rank analysis. Under mild conditions this test provides a maximin efficient test of the null hypothesis of no difference between treatment groups for all outcomes versus the alternative hypothesis that the experimental treatment is better than control for some or all of the component outcomes, and not worse for any. Herein applications are described to a simultaneous test for multiple differences in means, proportions or life-times, and combinations thereof, all on potentially different scales. The evaluation of sample size and power for such analyses is also described. For a test of means of two outcomes with a common unit variance and correlation 0.5, the sample size needed to provide 90% power for two separate one-sided tests at the 0.025 level is 64% greater than that needed for the single Wei-Lachin multivariate one-directional test at the 0.05 level. Thus, a Wei-Lachin test with these operating characteristics is 39% more efficient than two separate tests. Likewise, compared to a T 2-like omnibus test on 2 df, the Wei-Lachin test is 32% more efficient. An example is provided in which the Wei-Lachin test of multiple components has superior power to a test of a composite outcome.  相似文献   

13.
The most commonly used method in evolutionary biology for combining information across multiple tests of the same null hypothesis is Fisher's combined probability test. This note shows that an alternative method called the weighted Z-test has more power and more precision than does Fisher's test. Furthermore, in contrast to some statements in the literature, the weighted Z-method is superior to the unweighted Z-transform approach. The results in this note show that, when combining P-values from multiple tests of the same hypothesis, the weighted Z-method should be preferred.  相似文献   

14.
Decady YJ  Thomas DR 《Biometrics》2000,56(3):893-896
Loughin and Scherer (1998, Biometrics 54, 630-637) investigated tests of association in two-way tables when one of the categorical variables allows for multiple-category responses from individual respondents. Standard chi-squared tests are invalid in this case, and they developed a bootstrap test procedure that provides good control of test levels under the null hypothesis. This procedure and some others that have been proposed are computationally involved and are based on techniques that are relatively unfamiliar to many practitioners. In this paper, the methods introduced by Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) for analyzing complex survey data are used to develop a simple test based on a corrected chi-squared statistic.  相似文献   

15.
Simulation models are widely used to represent the dynamics of ecological systems. A common question with such models is how changes to a parameter value or functional form in the model alter the results. Some authors have chosen to answer that question using frequentist statistical hypothesis tests (e.g. ANOVA). This is inappropriate for two reasons. First, p‐values are determined by statistical power (i.e. replication), which can be arbitrarily high in a simulation context, producing minuscule p‐values regardless of the effect size. Second, the null hypothesis of no difference between treatments (e.g. parameter values) is known a priori to be false, invalidating the premise of the test. Use of p‐values is troublesome (rather than simply irrelevant) because small p‐values lend a false sense of importance to observed differences. We argue that modelers should abandon this practice and focus on evaluating the magnitude of differences between simulations. Synthesis Researchers analyzing field or lab data often test ecological hypotheses using frequentist statistics (t‐tests, ANOVA, etc.) that focus on p‐values. Field and lab data usually have limited sample sizes, and p‐values are valuable for quantifying the probability of making incorrect inferences in that situation. However, modern ecologists increasingly rely on simulation models to address complex questions, and those who were trained in frequentist statistics often apply the hypothesis‐testing approach inappropriately to their simulation results. Our paper explains why p‐values are not informative for interpreting simulation models, and suggests better ways to evaluate the ecological significance of model results.  相似文献   

16.
PJE. Goss  R. C. Lewontin 《Genetics》1996,143(1):589-602
Regions of differing constraint, mutation rate or recombination along a sequence of DNA or amino acids lead to a nonuniform distribution of polymorphism within species or fixed differences between species. The power of five tests to reject the null hypothesis of a uniform distribution is studied for four classes of alternate hypothesis. The tests explored are the variance of interval lengths; a modified variance test, which includes covariance between neighboring intervals; the length of the longest interval; the length of the shortest third-order interval; and a composite test. Although there is no uniformly most powerful test over the range of alternate hypotheses tested, the variance and modified variance tests usually have the highest power. Therefore, we recommend that one of these two tests be used to test departure from uniformity in all circumstances. Tables of critical values for the variance and modified variance tests are given. The critical values depend both on the number of events and the number of positions in the sequence. A computer program is available on request that calculates both the critical values for a specified number of events and number of positions as well as the significance level of a given data set.  相似文献   

17.
Widely used in testing statistical hypotheses, the Bonferroni multiple test has a rather low power that entails a high risk to accept falsely the overall null hypothesis and therefore to not detect really existing effects. We suggest that when the partial test statistics are statistically independent, it is possible to reduce this risk by using binomial modifications of the Bonferroni test. Instead of rejecting the null hypothesis when at least one of n partial null hypotheses is rejected at a very high level of significance (say, 0.005 in the case of n = 10), as it is prescribed by the Bonferroni test, the binomial tests recommend to reject the null hypothesis when at least k partial null hypotheses (say, k = [n/2]) are rejected at much lower level (up to 30-50%). We show that the power of such binomial tests is essentially higher as compared with the power of the original Bonferroni and some modified Bonferroni tests. In addition, such an approach allows us to combine tests for which the results are known only for a fixed significance level. The paper contains tables and a computer program which allow to determine (retrieve from a table or to compute) the necessary binomial test parameters, i.e. either the partial significance level (when k is fixed) or the value of k (when the partial significance level is fixed).  相似文献   

18.
CONSEL: for assessing the confidence of phylogenetic tree selection.   总被引:10,自引:0,他引:10  
CONSEL is a program to assess the confidence of the tree selection by giving the p-values for the trees. The main thrust of the program is to calculate the p-value of the Approximately Unbiased (AU) test using the multi-scale bootstrap technique. This p-value is less biased than the other conventional p-values such as the Bootstrap Probability (BP), the Kishino-Hasegawa (KH) test, the Shimodaira-Hasegawa (SH) test, and the Weighted Shimodaira-Hasegawa (WSH) test. CONSEL calculates all these p-values from the output of the phylogeny program packages such as Molphy, PAML, and PAUP*. Furthermore, CONSEL is applicable to a wide class of problems where the BPs are available. AVAILABILITY: The programs are written in C language. The source code for Unix and the executable binary for DOS are found at http://www.ism.ac.jp/~shimo/ CONTACT: shimo@ism.ac.jp  相似文献   

19.
We present two tests for seasonal trend in monthly incidence data. The first approach uses a penalized likelihood to choose the number of harmonic terms to include in a parametric harmonic model (which includes time trends and autogression as well as seasonal harmonic terms) and then tests for seasonality using a parametric bootstrap test. The second approach uses a semiparametric regression model to test for seasonal trend. In the semiparametric model, the seasonal pattern is modeled nonparametrically, parametric terms are included for autoregressive effects and a linear time trend, and a parametric bootstrap test is used to test for seasonality. For both procedures, a null distribution is generated under a null Poisson model with time trends and autoregression parameters.We apply the methods to skin melanoma incidence rates collected by the surveillance, epidemiology, and end results (SEER) program of the National Cancer Institute, and perform simulation studies to evaluate the type I error rate and power for the two procedures. These simulations suggest that both procedures are alpha-level procedures. In addition, the harmonic model/bootstrap test had similar or larger power than the semiparametric model/bootstrap test for a wide range of alternatives, and the harmonic model/bootstrap test is much easier to implement. Thus, we recommend the harmonic model/bootstrap test for the analysis of seasonal incidence data.  相似文献   

20.

Background

The theory has been put forward that if a null hypothesis is true, P-values should follow a Uniform distribution. This can be used to check the validity of randomisation.

Method

The theory was tested by simulation for two sample t tests for data from a Normal distribution and a Lognormal distribution, for two sample t tests which are not independent, and for chi-squared and Fisher’s exact test using small and using large samples.

Results

For the two sample t test with Normal data the distribution of P-values was very close to the Uniform. When using Lognormal data this was no longer true, and the distribution had a pronounced mode. For correlated tests, even using data from a Normal distribution, the distribution of P-values varied from simulation run to simulation run, but did not look close to Uniform in any realisation. For binary data in a small sample, only a few probabilities were possible and distribution was very uneven. With a sample of two groups of 1,000 observations, there was great unevenness in the histogram and a poor fit to the Uniform.

Conclusions

The notion that P-values for comparisons of groups using baseline data in randomised clinical trials should follow a Uniform distribution if the randomisation is valid has been found to be true only in the context of independent variables which follow a Normal distribution, not for Lognormal data, correlated variables, or binary data using either chi-squared or Fisher’s exact tests. This should not be used as a check for valid randomisation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号