首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
We revisit statistical tests for branches of evolutionary trees reconstructed upon molecular data. A new, fast, approximate likelihood-ratio test (aLRT) for branches is presented here as a competitive alternative to nonparametric bootstrap and Bayesian estimation of branch support. The aLRT is based on the idea of the conventional LRT, with the null hypothesis corresponding to the assumption that the inferred branch has length 0. We show that the LRT statistic is asymptotically distributed as a maximum of three random variables drawn from the chi(0)2 + chi(1)2 distribution. The new aLRT of interior branch uses this distribution for significance testing, but the test statistic is approximated in a slightly conservative but practical way as 2(l1- l2), i.e., double the difference between the maximum log-likelihood values corresponding to the best tree and the second best topological arrangement around the branch of interest. Such a test is fast because the log-likelihood value l2 is computed by optimizing only over the branch of interest and the four adjacent branches, whereas other parameters are fixed at their optimal values corresponding to the best ML tree. The performance of the new test was studied on simulated 4-, 12-, and 100-taxon data sets with sequences of different lengths. The aLRT is shown to be accurate, powerful, and robust to certain violations of model assumptions. The aLRT is implemented within the algorithm used by the recent fast maximum likelihood tree estimation program PHYML (Guindon and Gascuel, 2003).  相似文献   

Metrics of phylogenetic tree reliability, such as parametric bootstrap percentages or Bayesian posterior probabilities, represent internal measures of the topological reproducibility of a phylogenetic tree, while the recently introduced aLRT (approximate likelihood ratio test) assesses the likelihood that a branch exists on a maximum-likelihood tree. Although those values are often equated with phylogenetic tree accuracy, they do not necessarily estimate how well a reconstructed phylogeny represents cladistic relationships that actually exist in nature. The authors have therefore attempted to quantify how well bootstrap percentages, posterior probabilities, and aLRT measures reflect the probability that a deduced phylogenetic clade is present in a known phylogeny. The authors simulated the evolution of bacterial genes of varying lengths under biologically realistic conditions, and reconstructed those known phylogenies using both maximum likelihood and Bayesian methods. Then, they measured how frequently clades in the reconstructed trees exhibiting particular bootstrap percentages, aLRT values, or posterior probabilities were found in the true trees. The authors have observed that none of these values correlate with the probability that a given clade is present in the known phylogeny. The major conclusion is that none of the measures provide any information about the likelihood that an individual clade actually exists. It is also found that the mean of all clade support values on a tree closely reflects the average proportion of all clades that have been assigned correctly, and is thus a good representation of the overall accuracy of a phylogenetic tree.  相似文献   

The Bayesian method for estimating species phylogenies from molecular sequence data provides an attractive alternative to maximum likelihood with nonparametric bootstrap due to the easy interpretation of posterior probabilities for trees and to availability of efficient computational algorithms. However, for many data sets it produces extremely high posterior probabilities, sometimes for apparently incorrect clades. Here we use both computer simulation and empirical data analysis to examine the effect of the prior model for internal branch lengths. We found that posterior probabilities for trees and clades are sensitive to the prior for internal branch lengths, and priors assuming long internal branches cause high posterior probabilities for trees. In particular, uniform priors with high upper bounds bias Bayesian clade probabilities in favor of extreme values. We discuss possible remedies to the problem, including empirical and full Bayesian methods and subjective procedures suggested in Bayesian hypothesis testing. Our results also suggest that the bootstrap proportion and Bayesian posterior probability are different measures of accuracy, and that the bootstrap proportion, if interpreted as the probability that the clade is true, can be either too liberal or too conservative.  相似文献   

In recent years, the emphasis of theoretical work on phylogenetic inference has shifted from the development of new tree inference methods to the development of methods to measure the statistical support for the topologies. This paper reviews 3 approaches to assign support values to branches in trees obtained in the analysis of molecular sequences: the bootstrap, the Bayesian posterior probabilities for clades, and the interior branch tests. In some circumstances, these methods give different answers. It should not be surprising: their assumptions are different. Thus the interior branch tests assume that a given topology is true and only consider if a particular branch length is longer than zero. If a tree is incorrect, a wrong branch (a low bootstrap or Bayesian support may be an indication) may have a non-zero length. If the substitution model is oversimplified, the length of a branch may be overestimated, and the Bayesian support for the branch may be inflated. The bootstrap, on the other hand, approximates the variance of the data under the real model of sequence evolution, because it involves direct resampling from this data. Thus the discrepancy between the Bayesian support and the bootstrap support may signal model inaccuracy. In practical application, use of all 3 methods is recommended, and if discrepancies are observed, then a careful analysis of their potential origins should be made.  相似文献   

Probabilistic tests of topology offer a powerful means of evaluating competing phylogenetic hypotheses. The performance of the nonparametric Shimodaira-Hasegawa (SH) test, the parametric Swofford-Olsen-Waddell-Hillis (SOWH) test, and Bayesian posterior probabilities were explored for five data sets for which all the phylogenetic relationships are known with a very high degree of certainty. These results are consistent with previous simulation studies that have indicated a tendency for the SOWH test to be prone to generating Type 1 errors because of model misspecification coupled with branch length heterogeneity. These results also suggest that the SOWH test may accord overconfidence in the true topology when the null hypothesis is in fact correct. In contrast, the SH test was observed to be much more conservative, even under high substitution rates and branch length heterogeneity. For some of those data sets where the SOWH test proved misleading, the Bayesian posterior probabilities were also misleading. The results of all tests were strongly influenced by the exact substitution model assumptions. Simple models, especially those that assume rate homogeneity among sites, had a higher Type 1 error rate and were more likely to generate misleading posterior probabilities. For some of these data sets, the commonly used substitution models appear to be inadequate for estimating appropriate levels of uncertainty with the SOWH test and Bayesian methods. Reasons for the differences in statistical power between the two maximum likelihood tests are discussed and are contrasted with the Bayesian approach.  相似文献   

Owing to the exponential growth of genome databases, phylogenetic trees are now widely used to test a variety of evolutionary hypotheses. Nevertheless, computation time burden limits the application of methods such as maximum likelihood nonparametric bootstrap to assess reliability of evolutionary trees. As an alternative, the much faster Bayesian inference of phylogeny, which expresses branch support as posterior probabilities, has been introduced. However, marked discrepancies exist between nonparametric bootstrap proportions and Bayesian posterior probabilities, leading to difficulties in the interpretation of sometimes strongly conflicting results. As an attempt to reconcile these two indices of node reliability, we apply the nonparametric bootstrap resampling procedure to the Bayesian approach. The correlation between posterior probabilities, bootstrap maximum likelihood percentages, and bootstrapped posterior probabilities was studied for eight highly diverse empirical data sets and were also investigated using experimental simulation. Our results show that the relation between posterior probabilities and bootstrapped maximum likelihood percentages is highly variable but that very strong correlations always exist when Bayesian node support is estimated on bootstrapped character matrices. Moreover, simulations corroborate empirical observations in suggesting that, being more conservative, the bootstrap approach might be less prone to strongly supporting a false phylogenetic hypothesis. Thus, apparent conflicts in topology recovered by the Bayesian approach were reduced after bootstrapping. Both posterior probabilities and bootstrap supports are of great interest to phylogeny as potential upper and lower bounds of node reliability, but they are surely not interchangeable and cannot be directly compared.  相似文献   

Most phylogeographic studies have used maximum likelihood or maximum parsimony to infer phylogeny and bootstrap analysis to evaluate support for trees. Recently, Bayesian methods using Marlov chain Monte Carlo to search tree space and simultaneously estimate tree support have become popular due to its fast search speed and ability to create a posterior distribution of parameters of interest. Here, I present a study that utilizes Bayesian methods to infer phylogenetic relationships of the cornsnake (Elaphe guttata) complex using cytochrome b sequences. Examination of the posterior probability distributions confirms the existence of three geographic lineages. Additionally, there is no support for the monophyly of the subspecies of E. guttata. Results suggest the three geographic lineages partially conform to the ranges of previously defined subspecies, although Shimodaira-Hasegawa tests suggest that subspecies-constrained trees produce significantly poorer likelihood estimates than the most likely trees reflecting the evolution of three geographic assemblages. Based on molecular support, these three geographic assemblages are recognized as species using evolutionary species criteria: E. guttata, Elaphe slowinskii, and Elaphe emoryi [phylogeographic, maximum likelihood, maximum parsimony, bootstrap, Bayesian, Markov chain Monte Carlo, cornsnake, Cytochrome b, geographic lineages, E. guttta, E. slowinskii, and E. emoryi].  相似文献   

Newton MA  Lee Y 《Biometrics》2000,56(4):1088-1097
Cancerous tumor growth creates cells with abnormal DNA. Allelic-loss experiments identify genomic deletions in cancer cells, but sources of variation and intrinsic dependencies complicate inference about the location and effect of suppressor genes; such genes are the target of these experiments and are thought to be involved in tumor development. We investigate properties of an instability-selection model of allelic-loss data, including likelihood-based parameter estimation and hypothesis testing. By considering a special complete-data case, we derive an approximate calibration method for hypothesis tests of sporadic deletion. Parametric bootstrap and Bayesian computations are also developed. Data from three allelic-loss studies are reanalyzed to illustrate the methods.  相似文献   

Assessment of the reliability of a given phylogenetic hypothesis is an important step in phylogenetic analysis. Historically, the nonparametric bootstrap procedure has been the most frequently used method for assessing the support for specific phylogenetic relationships. The recent employment of Bayesian methods for phylogenetic inference problems has resulted in clade support being expressed in terms of posterior probabilities. We used simulated data and the four-taxon case to explore the relationship between nonparametric bootstrap values (as inferred by maximum likelihood) and posterior probabilities (as inferred by Bayesian analysis). The results suggest a complex association between the two measures. Three general regions of tree space can be identified: (1) the neutral zone, where differences between mean bootstrap and mean posterior probability values are not significant, (2) near the two-branch corner, and (3) deep in the two-branch corner. In the last two regions, significant differences occur between mean bootstrap and mean posterior probability values. Whether bootstrap or posterior probability values are higher depends on the data in support of alternative topologies. Examination of star topologies revealed that both bootstrap and posterior probability values differ significantly from theoretical expectations; in particular, there are more posterior probability values in the range 0.85-1 than expected by theory. Therefore, our results corroborate the findings of others that posterior probability values are excessively high. Our results also suggest that extrapolations from single topology branch-length studies are unlikely to provide any general conclusions regarding the relationship between bootstrap and posterior probability values.  相似文献   

In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differing evolutionary dynamics at different loci is essential for accurate tree prediction. Existing maximum likelihood (ML) and Bayesian approaches are computationally intensive. We present an alternative approach that is orders of magnitude faster. The method, Distance Rates (DistR), estimates rates based upon distances derived from gene/protein sequence data. Simulation studies indicate that this technique is accurate compared with other methods and robust to missing sequence data. The DistR method was applied to a fungal mitochondrial data set, and the rate estimates compared well to those obtained using existing ML and Bayesian approaches. Inclusion of the protein rates estimated from the DistR method into the ML calculation of trees as a branch length multiplier resulted in a significantly improved fit as measured by the Akaike Information Criterion (AIC). Furthermore, bootstrap support for the ML topology was significantly greater when protein rates were used, and some evident errors in the concatenated ML tree topology (i.e., without protein rates) were corrected. [Bayesian credible intervals; DistR method; multigene phylogeny; PHYML; rate heterogeneity.].  相似文献   

Many empirical studies have revealed considerable differences between nonparametric bootstrapping and Bayesian posterior probabilities in terms of the support values for branches, despite claimed predictions about their approximate equivalence. We investigated this problem by simulating data, which were then analyzed by maximum likelihood bootstrapping and Bayesian phylogenetic analysis using identical models and reoptimization of parameter values. We show that Bayesian posterior probabilities are significantly higher than corresponding nonparametric bootstrap frequencies for true clades, but also that erroneous conclusions will be made more often. These errors are strongly accentuated when the models used for analyses are underparameterized. When data are analyzed under the correct model, nonparametric bootstrapping is conservative. Bayesian posterior probabilities are also conservative in this respect, but less so.  相似文献   

Despite the importance of molecular phylogenetics, few of its assumptions have been tested with real data. It is commonly assumed that nonparametric bootstrap values are an underestimate of the actual support, Bayesian posterior probabilities are an overestimate of the actual support, and among-gene phylogenetic conflict is low. We directly tested these assumptions by using a well-supported yeast reference tree. We found that bootstrap values were not significantly different from accuracy. Bayesian support values were, however, significant overestimates of accuracy but still had low false-positive error rates (0% to 2.8%) at the highest values (>99%). Although we found evidence for a branch-length bias contributing to conflict, there was little evidence for widespread, strongly supported among-gene conflict from bootstraps. The results demonstrate that caution is warranted concerning conclusions of conflict based on the assumption of underestimation for support values in real data.  相似文献   

The use of parameter-rich substitution models in molecular phylogenetics has been criticized on the basis that these models can cause a reduction both in accuracy and in the ability to discriminate among competing topologies. We have explored the relationship between nucleotide substitution model complexity and nonparametric bootstrap support under maximum likelihood (ML) for six data sets for which the true relationships are known with a high degree of certainty. We also performed equally weighted maximum parsimony analyses in order to assess the effects of ignoring branch length information during tree selection. We observed that maximum parsimony gave the lowest mean estimate of bootstrap support for the correct set of nodes relative to the ML models for every data set except one. For several data sets, we established that the exact distribution used to model among-site rate variation was critical for a successful phylogenetic analysis. Site-specific rate models were shown to perform very poorly relative to gamma and invariable sites models for several of the data sets most likely because of the gross underestimation of branch lengths. The invariable sites model also performed poorly for several data sets where this model had a poor fit to the data, suggesting that addition of the gamma distribution can be critical. Estimates of bootstrap support for the correct nodes often increased under gamma and invariable sites models relative to equal rates models. Our observations are contrary to the prediction that such models cause reduced confidence in phylogenetic hypotheses. Our results raise several issues regarding the process of model selection, and we briefly discuss model selection uncertainty and the role of sensitivity analyses in molecular phylogenetics.  相似文献   

In this paper, we investigate the phylogenetic placement of Pleospora gaudefroyi using partial SSU as well as ITS ribosomal DNA sequences. Both SSU and ITS data sets agreed in the placement of P. gaudefroyi. Parsimony and neighbor-joining analyses of each data set placed P. gaudefroyi within the Pleosporaceae with 100% bootstrap support. Pleospora gaudefroyi was sister taxon in the Pleosporaceae represented by Alternaria alternata, Cochliobolus sativus, Pleospora herbarum, Pyrenophora tritici-repentis and Setosphaeria rostrata. Pleospora gaudefroyi was separated from other genera in the Pleosporaceae in 94% of the bootstrap replicates in parsimony and neighbor-joining analyses. When P. gaudefroyi was constrained to monophyly with P. herbarum, all resulting trees were significantly worse than the optimal tree in both Kishino-Hasegawa and Shimodaira-Hasegawa tests. Pleospora gaudefroyi was therefore excluded from Pleospora, and transferred to the new genus Decorospora placed in the Pleosporaceae. Decorospora (Dothideomycetes) has characteristic ascospores enclosed in a sheath with 4-5 apical extensions. The distribution and substrate types for D. gaudefroyi are summarized and updated based on additional collections.  相似文献   

In this study, we used an empirical example based on 100 mitochondrial genomes from higher teleost fishes to compare the accuracy of parsimony-based jackknife values with Bayesian support values. Phylogenetic analyses of 366 partitions, using differential taxon and character sampling from the entire data matrix of 100 taxa and 7,990 characters, were performed for both phylogenetic methods. The tree topology and branch-support values from each partition were compared with the tree inferred from all taxa and characters. Using this approach, we quantified the accuracy of the branch-support values assigned by the jackknife and Bayesian methods, with respect to each of 15 basal clades. In comparing the jackknife and Bayesian methods, we found that (1) both measures of support differ significantly from an ideal support index; (2) the jackknife underestimated support values; (3) the Bayesian method consistently overestimated support; (4) the magnitude by which Bayesian values overestimate support exceeds the magnitude by which the jackknife underestimates support; and (5) both methods performed poorly when taxon sampling was increased and character sampling was not increases. These results indicate that (1) the higher Bayesian support values are inappropriate (in magnitude), and (2) Bayesian support values should not be interpreted as probabilities that clades are correctly resolved. We advocate the continued use of the relatively conservative bootstrap and jackknife approaches to estimating branch support rather than the more extreme overestimates provided by the Markov Chain Monte Carlo-based Bayesian methods.  相似文献   

Martialinae are pale, eyeless and probably hypogaeic predatory ants. Morphological character sets suggest a close relationship to the ant subfamily Leptanillinae. Recent analyses based on molecular sequence data suggest that Martialinae are the sister group to all extant ants. However, by comparing molecular studies and different reconstruction methods, the position of Martialinae remains ambiguous. While this sister group relationship was well supported by Bayesian partitioned analyses, Maximum Likelihood approaches could not unequivocally resolve the position of Martialinae. By re-analysing a previous published molecular data set, we show that the Maximum Likelihood approach is highly appropriate to resolve deep ant relationships, especially between Leptanillinae, Martialinae and the remaining ant subfamilies. Based on improved alignments, alignment masking, and tree reconstructions with a sufficient number of bootstrap replicates, our results strongly reject a placement of Martialinae at the first split within the ant tree of life. Instead, we suggest that Leptanillinae are a sister group to all other extant ant subfamilies, whereas Martialinae branch off as a second lineage. This assumption is backed by approximately unbiased (AU) tests, additional Bayesian analyses and split networks. Our results demonstrate clear effects of improved alignment approaches, alignment masking and data partitioning. We hope that our study illustrates the importance of thorough, comprehensible phylogenetic analyses using the example of ant relationships.  相似文献   

The field of phylogenetic tree estimation has been dominated by three broad classes of methods: distance-based approaches, parsimony and likelihood-based methods (including maximum likelihood (ML) and Bayesian approaches). Here we introduce two new approaches to tree inference: pairwise likelihood estimation and a distance-based method that estimates the number of substitutions along the paths through the tree. Our results include the derivation of the formulae for the probability that two leaves will be identical at a site given a number of substitutions along the path connecting them. We also derive the posterior probability of the number of substitutions along a path between two sequences. The calculations for the posterior probabilities are exact for group-based, symmetric models of character evolution, but are only approximate for more general models.  相似文献   

Using a four-taxon example under a simple model of evolution, we show that the methods of maximum likelihood and maximum posterior probability (which is a Bayesian method of inference) may not arrive at the same optimal tree topology. Some patterns that are separately uninformative under the maximum likelihood method are separately informative under the Bayesian method. We also show that this difference has impact on the bootstrap frequencies and the posterior probabilities of topologies, which therefore are not necessarily approximately equal. Efron et al. (Proc. Natl. Acad. Sci. USA 93:13429-13434, 1996) stated that bootstrap frequencies can, under certain circumstances, be interpreted as posterior probabilities. This is true only if one includes a non-informative prior distribution of the possible data patterns, and most often the prior distributions are instead specified in terms of topology and branch lengths. [Bayesian inference; maximum likelihood method; Phylogeny; support.].  相似文献   

Estimating the species accumulation curve using mixtures   总被引:3,自引:0,他引:3  
Mao CX  Colwell RK  Chang J 《Biometrics》2005,61(2):433-441
As a significant tool in ecological studies, the species accumulation curve or the collector's curve is the graph of the expected number of detected species as a function of sampling effort. The problem of estimating the species accumulation curve based on an empirical data set arising from quadrat sampling is studied in a nonparametric binomial mixture model. It will be shown that estimating the species accumulation curve not only is independent of the unknown number of species but also includes estimating the number of species as a limiting case. For the purpose of interpolation, moment-based estimators, associated with asymptotic confidence intervals, are developed from several points of view. A likelihood-based procedure is developed for the purpose of extrapolation, associated with bootstrap confidence intervals. The proposed methods are illustrated by ecological data sets.  相似文献   

Several tests of molecular phylogenies have been proposed over the last decades, but most of them lead to strikingly different P-values. I propose that such discrepancies are principally due to different forms of null hypotheses. To support this hypothesis, two new tests are described. Both consider the composite null hypothesis that all the topologies are equidistant from the true but unknown topology. This composite hypothesis can either be reduced to the simple hypothesis at the least favorable distribution (frequentist significance test [FST]) or to the maximum likelihood topology (frequentist hypothesis test [FHT]). In both cases, the reduced null hypothesis is tested against each topology included in the analysis. The tests proposed have an information-theoretic justification, and the distribution of their test statistic is estimated by a nonparametric bootstrap, adjusting P-values for multiple comparisons. I applied the new tests to the reanalysis of two chloroplast genes, psaA and psbB, and compared the results with those of previously described tests. As expected, the FST and the FHT behaved approximately like the Shimodaira-Hasegawa test and the bootstrap, respectively. Although the tests give overconfidence in a wrong tree when an overly simple nucleotide substitution model is assumed, more complex models incorporating heterogeneity among codon positions resolve some conflicts. To further investigate the influence of the null hypothesis, a power study was conducted. Simulations showed that FST and the Shimodaira-Hasegawa test are the least powerful and FHT is the most powerful across the parameter space. Although the size of all the tests is affected by misspecification, the two new tests appear more robust against misspecification of the model of evolution and consistently supported the hypothesis that the Gnetales are nested within gymnosperms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号