首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Owing to the exponential growth of genome databases, phylogenetic trees are now widely used to test a variety of evolutionary hypotheses. Nevertheless, computation time burden limits the application of methods such as maximum likelihood nonparametric bootstrap to assess reliability of evolutionary trees. As an alternative, the much faster Bayesian inference of phylogeny, which expresses branch support as posterior probabilities, has been introduced. However, marked discrepancies exist between nonparametric bootstrap proportions and Bayesian posterior probabilities, leading to difficulties in the interpretation of sometimes strongly conflicting results. As an attempt to reconcile these two indices of node reliability, we apply the nonparametric bootstrap resampling procedure to the Bayesian approach. The correlation between posterior probabilities, bootstrap maximum likelihood percentages, and bootstrapped posterior probabilities was studied for eight highly diverse empirical data sets and were also investigated using experimental simulation. Our results show that the relation between posterior probabilities and bootstrapped maximum likelihood percentages is highly variable but that very strong correlations always exist when Bayesian node support is estimated on bootstrapped character matrices. Moreover, simulations corroborate empirical observations in suggesting that, being more conservative, the bootstrap approach might be less prone to strongly supporting a false phylogenetic hypothesis. Thus, apparent conflicts in topology recovered by the Bayesian approach were reduced after bootstrapping. Both posterior probabilities and bootstrap supports are of great interest to phylogeny as potential upper and lower bounds of node reliability, but they are surely not interchangeable and cannot be directly compared.  相似文献   

2.
What does the posterior probability of a phylogenetic tree mean?This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the posterior probability of a tree is the probability that the tree is correct, assuming that the model is correct. At the same time, the Bayesian method can be sensitive to model misspecification, and the sensitivity of the Bayesian method appears to be greater than the sensitivity of the nonparametric bootstrap method (using maximum likelihood to estimate trees). Although the estimates of phylogeny obtained by use of the method of maximum likelihood or the Bayesian method are likely to be similar, the assessment of the uncertainty of inferred trees via either bootstrapping (for maximum likelihood estimates) or posterior probabilities (for Bayesian estimates) is not likely to be the same. We suggest that the Bayesian method be implemented with the most complex models of those currently available, as this should reduce the chance that the method will concentrate too much probability on too few trees.  相似文献   

3.
The Bayesian method for estimating species phylogenies from molecular sequence data provides an attractive alternative to maximum likelihood with nonparametric bootstrap due to the easy interpretation of posterior probabilities for trees and to availability of efficient computational algorithms. However, for many data sets it produces extremely high posterior probabilities, sometimes for apparently incorrect clades. Here we use both computer simulation and empirical data analysis to examine the effect of the prior model for internal branch lengths. We found that posterior probabilities for trees and clades are sensitive to the prior for internal branch lengths, and priors assuming long internal branches cause high posterior probabilities for trees. In particular, uniform priors with high upper bounds bias Bayesian clade probabilities in favor of extreme values. We discuss possible remedies to the problem, including empirical and full Bayesian methods and subjective procedures suggested in Bayesian hypothesis testing. Our results also suggest that the bootstrap proportion and Bayesian posterior probability are different measures of accuracy, and that the bootstrap proportion, if interpreted as the probability that the clade is true, can be either too liberal or too conservative.  相似文献   

4.
Murphy and colleagues reported that the mammalian phylogeny was resolved by Bayesian phylogenetics. However, the DNA sequences they used had many alignment gaps and undetermined nucleotide sites. We therefore reanalyzed their data by minimizing unshared nucleotide sites and retaining as many species as possible (13 species). In constructing phylogenetic trees, we used the Bayesian, maximum likelihood (ML), maximum parsimony (MP), and neighbor-joining (NJ) methods with different substitution models. These trees were constructed by using both protein and DNA sequences. The results showed that the posterior probabilities for Bayesian trees were generally much higher than the bootstrap values for ML, MP, and NJ trees. Two different Bayesian topologies for the same set of species were sometimes supported by high posterior probabilities, implying that two different topologies can be judged to be correct by Bayesian phylogenetics. This suggests that the posterior probability in Bayesian analysis can be excessively high as an indication of statistical confidence and therefore Murphy et al.'s tree, which largely depends on Bayesian posterior probability, may not be correct.  相似文献   

5.
Assessment of the reliability of a given phylogenetic hypothesis is an important step in phylogenetic analysis. Historically, the nonparametric bootstrap procedure has been the most frequently used method for assessing the support for specific phylogenetic relationships. The recent employment of Bayesian methods for phylogenetic inference problems has resulted in clade support being expressed in terms of posterior probabilities. We used simulated data and the four-taxon case to explore the relationship between nonparametric bootstrap values (as inferred by maximum likelihood) and posterior probabilities (as inferred by Bayesian analysis). The results suggest a complex association between the two measures. Three general regions of tree space can be identified: (1) the neutral zone, where differences between mean bootstrap and mean posterior probability values are not significant, (2) near the two-branch corner, and (3) deep in the two-branch corner. In the last two regions, significant differences occur between mean bootstrap and mean posterior probability values. Whether bootstrap or posterior probability values are higher depends on the data in support of alternative topologies. Examination of star topologies revealed that both bootstrap and posterior probability values differ significantly from theoretical expectations; in particular, there are more posterior probability values in the range 0.85-1 than expected by theory. Therefore, our results corroborate the findings of others that posterior probability values are excessively high. Our results also suggest that extrapolations from single topology branch-length studies are unlikely to provide any general conclusions regarding the relationship between bootstrap and posterior probability values.  相似文献   

6.
Phylogenetic inference and evaluating support for inferred relationships is at the core of many studies testing evolutionary hypotheses. Despite the popularity of nonparametric bootstrap frequencies and Bayesian posterior probabilities, the interpretation of these measures of tree branch support remains a source of discussion. Furthermore, both methods are computationally expensive and become prohibitive for large data sets. Recent fast approximate likelihood-based measures of branch supports (approximate likelihood ratio test [aLRT] and Shimodaira-Hasegawa [SH]-aLRT) provide a compelling alternative to these slower conventional methods, offering not only speed advantages but also excellent levels of accuracy and power. Here we propose an additional method: a Bayesian-like transformation of aLRT (aBayes). Considering both probabilistic and frequentist frameworks, we compare the performance of the three fast likelihood-based methods with the standard bootstrap (SBS), the Bayesian approach, and the recently introduced rapid bootstrap. Our simulations and real data analyses show that with moderate model violations, all tests are sufficiently accurate, but aLRT and aBayes offer the highest statistical power and are very fast. With severe model violations aLRT, aBayes and Bayesian posteriors can produce elevated false-positive rates. With data sets for which such violation can be detected, we recommend using SH-aLRT, the nonparametric version of aLRT based on a procedure similar to the Shimodaira-Hasegawa tree selection. In general, the SBS seems to be excessively conservative and is much slower than our approximate likelihood-based methods.  相似文献   

7.
距离矩阵邻接法、最大简约法和最大似然法是重建生物系统关系的3种主要方法。普遍认为最大似然法在原理上优于前二种方法,但其计算复杂费时。由于现行计算机的能力尚达不到其要求而实用性差,特别是在处理大数据集样本(即大于25个分类单元)时,用此方法几乎不可能。新近提出的贝叶斯法(Bayesianmethod)既保留了最大似然法的基本原理,又引进了马尔科夫链的蒙特卡洛方法,并使计算时间大大缩短。本文用贝叶斯法对硬蜱属(Ixodes)19个种的线粒体16S rDNA片段进行了系统进化分析。从总体上看,分析结果与现有的基于形态学的分类体系基本吻合。但与现存的假说相反,莱姆病的主要宿主蓖籽硬蜱复合种组并非单系。通过比较贝叶斯法与其它三种方法的结果,我们认为贝叶斯法是一种系统进化分析的好方法,它既能根据分子进化的现有理论和各种模型用概率重建系统进化关系,又克服了最大似然法计算速度慢、不适用于大数据集样本的缺陷。贝叶斯法根据后验概率直观地表示系统进化关系的分析结果,不需要用自引导法进行检验。可以预料,贝叶斯法将会被广泛地应用到系统进化分析上[动物学报49(3):380—388,2003]。  相似文献   

8.
Many phylogenetic methods produce large collections of trees as opposed to a single tree, which allows the exploration of support for various evolutionary hypotheses. However, to be useful, the information contained in large collections of trees should be summarized; frequently this is achieved by constructing a consensus tree. Consensus trees display only those signals that are present in a large proportion of the trees. However, by their very nature consensus trees require that any conflicts between the trees are necessarily disregarded. We present a method that extends the notion of consensus trees to allow the visualization of conflicting hypotheses in a consensus network. We demonstrate the utility of this method in highlighting differences amongst maximum likelihood bootstrap values and Bayesian posterior probabilities in the placental mammal phylogeny, and also in comparing the phylogenetic signal contained in amino acid versus nucleotide characters for hexapod monophyly.  相似文献   

9.
Most phylogeographic studies have used maximum likelihood or maximum parsimony to infer phylogeny and bootstrap analysis to evaluate support for trees. Recently, Bayesian methods using Marlov chain Monte Carlo to search tree space and simultaneously estimate tree support have become popular due to its fast search speed and ability to create a posterior distribution of parameters of interest. Here, I present a study that utilizes Bayesian methods to infer phylogenetic relationships of the cornsnake (Elaphe guttata) complex using cytochrome b sequences. Examination of the posterior probability distributions confirms the existence of three geographic lineages. Additionally, there is no support for the monophyly of the subspecies of E. guttata. Results suggest the three geographic lineages partially conform to the ranges of previously defined subspecies, although Shimodaira-Hasegawa tests suggest that subspecies-constrained trees produce significantly poorer likelihood estimates than the most likely trees reflecting the evolution of three geographic assemblages. Based on molecular support, these three geographic assemblages are recognized as species using evolutionary species criteria: E. guttata, Elaphe slowinskii, and Elaphe emoryi [phylogeographic, maximum likelihood, maximum parsimony, bootstrap, Bayesian, Markov chain Monte Carlo, cornsnake, Cytochrome b, geographic lineages, E. guttta, E. slowinskii, and E. emoryi].  相似文献   

10.
The clade size effect refers to a bias that causes middle‐sized clades to be less supported than small or large‐sized clades. This bias is present in resampling measures of support calculated under maximum likelihood and maximum parsimony and in Bayesian posterior probabilities. Previous analyses indicated that the clade size effect is worst in maximum parsimony, followed by maximum likelihood, while Bayesian inference is the least affected. Homoplasy was interpreted as the main cause of the effect. In this study, we explored the presence of the clade size effect in alternative measures of branch support under maximum parsimony: Bremer support and symmetric resampling, expressed as absolute frequencies and frequency differences. Analyses were performed using 50 molecular and morphological matrices. Symmetric resampling showed the same tendency that bootstrap and jackknife did for maximum parsimony and maximum likelihood. Few matrices showed a significant bias using Bremer support, presenting a better performance than resampling measures of support and comparable to Bayesian posterior probabilities. Our results indicate that the problem is not maximum parsimony, but resampling measures of support. We corroborated the role of homoplasy as a possible cause of the clade size effect, increasing the number of random trees during the resampling, which together with the higher chances that medium‐sized clades have of being contradicted generates the bias during the perturbation of the original matrix, making it stronger in resampling measures of support.  相似文献   

11.
We revisit statistical tests for branches of evolutionary trees reconstructed upon molecular data. A new, fast, approximate likelihood-ratio test (aLRT) for branches is presented here as a competitive alternative to nonparametric bootstrap and Bayesian estimation of branch support. The aLRT is based on the idea of the conventional LRT, with the null hypothesis corresponding to the assumption that the inferred branch has length 0. We show that the LRT statistic is asymptotically distributed as a maximum of three random variables drawn from the chi(0)2 + chi(1)2 distribution. The new aLRT of interior branch uses this distribution for significance testing, but the test statistic is approximated in a slightly conservative but practical way as 2(l1- l2), i.e., double the difference between the maximum log-likelihood values corresponding to the best tree and the second best topological arrangement around the branch of interest. Such a test is fast because the log-likelihood value l2 is computed by optimizing only over the branch of interest and the four adjacent branches, whereas other parameters are fixed at their optimal values corresponding to the best ML tree. The performance of the new test was studied on simulated 4-, 12-, and 100-taxon data sets with sequences of different lengths. The aLRT is shown to be accurate, powerful, and robust to certain violations of model assumptions. The aLRT is implemented within the algorithm used by the recent fast maximum likelihood tree estimation program PHYML (Guindon and Gascuel, 2003).  相似文献   

12.
While Bayesian analysis has become common in phylogenetics, the effects of topological prior probabilities on tree inference have not been investigated. In Bayesian analyses, the prior probability of topologies is almost always considered equal for all possible trees, and clade support is calculated from the majority rule consensus of the approximated posterior distribution of topologies. These uniform priors on tree topologies imply non-uniform prior probabilities of clades, which are dependent on the number of taxa in a clade as well as the number of taxa in the analysis. As such, uniform topological priors do not model ignorance with respect to clades. Here, we demonstrate that Bayesian clade support, bootstrap support, and jackknife support from 17 empirical studies are significantly and positively correlated with non-uniform clade priors resulting from uniform topological priors. Further, we demonstrate that this effect disappears for bootstrap and jackknife when data sets are free from character conflict, but remains pronounced for Bayesian clade supports, regardless of tree shape. Finally, we propose the use of a Bayes factor to account for the fact that uniform topological priors do not model ignorance with respect to clade probability.  相似文献   

13.
Polytomies and Bayesian phylogenetic inference   总被引:16,自引:0,他引:16  
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short branch lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.  相似文献   

14.
A new method is presented for inferring evolutionary trees using nucleotide sequence data. The birth-death process is used as a model of speciation and extinction to specify the prior distribution of phylogenies and branching times. Nucleotide substitution is modeled by a continuous-time Markov process. Parameters of the branching model and the substitution model are estimated by maximum likelihood. The posterior probabilities of different phylogenies are calculated and the phylogeny with the highest posterior probability is chosen as the best estimate of the evolutionary relationship among species. We refer to this as the maximum posterior probability (MAP) tree. The posterior probability provides a natural measure of the reliability of the estimated phylogeny. Two example data sets are analyzed to infer the phylogenetic relationship of human, chimpanzee, gorilla, and orangutan. The best trees estimated by the new method are the same as those from the maximum likelihood analysis of separate topologies, but the posterior probabilities are quite different from the bootstrap proportions. The results of the method are found to be insensitive to changes in the rate parameter of the branching process. Correspondence to: Z. Yang  相似文献   

15.
In Bayesian phylogenetics, confidence in evolutionary relationships is expressed as posterior probability--the probability that a tree or clade is true given the data, evolutionary model, and prior assumptions about model parameters. Model parameters, such as branch lengths, are never known in advance; Bayesian methods incorporate this uncertainty by integrating over a range of plausible values given an assumed prior probability distribution for each parameter. Little is known about the effects of integrating over branch length uncertainty on posterior probabilities when different priors are assumed. Here, we show that integrating over uncertainty using a wide range of typical prior assumptions strongly affects posterior probabilities, causing them to deviate from those that would be inferred if branch lengths were known in advance; only when there is no uncertainty to integrate over does the average posterior probability of a group of trees accurately predict the proportion of correct trees in the group. The pattern of branch lengths on the true tree determines whether integrating over uncertainty pushes posterior probabilities upward or downward. The magnitude of the effect depends on the specific prior distributions used and the length of the sequences analyzed. Under realistic conditions, however, even extraordinarily long sequences are not enough to prevent frequent inference of incorrect clades with strong support. We found that across a range of conditions, diffuse priors--either flat or exponential distributions with moderate to large means--provide more reliable inferences than small-mean exponential priors. An empirical Bayes approach that fixes branch lengths at their maximum likelihood estimates yields posterior probabilities that more closely match those that would be inferred if the true branch lengths were known in advance and reduces the rate of strongly supported false inferences compared with fully Bayesian integration.  相似文献   

16.
Bayesian inference of phylogeny is unique among phylogenetic reconstruction methods in that it produces a posterior distribution of trees rather than a point estimate of the best tree. The most common way to summarize this distribution is to report the majority-rule consensus tree annotated with the marginal posterior probabilities of each partition. Reporting a single tree discards information contained in the full underlying distribution and reduces the Bayesian analysis to simply another method for finding a point estimate of the tree. Even when a point estimate of the phylogeny is desired, the majority-rule consensus tree is only one possible method, and there may be others that are more appropriate for the given data set and application. We present a method for summarizing the distribution of trees that is based on identifying agreement subtrees that are frequently present in the posterior distribution. This method provides fully resolved binary trees for subsets of taxa with high marginal posterior probability on the entire tree and includes additional information about the spread of the distribution.  相似文献   

17.
The objective of this study was to obtain a quantitative assessment of the monophyly of morning glory taxa, specifically the genus Ipomoea and the tribe Argyreieae. Previous systematic studies of morning glories intimated the paraphyly of Ipomoea by suggesting that the genera within the tribe Argyreieae are derived from within Ipomoea; however, no quantitative estimates of statistical support were developed to address these questions. We applied a Bayesian analysis to provide quantitative estimates of monophyly in an investigation of morning glory relationships using DNA sequence data. We also explored various approaches for examining convergence of the Markov chain Monte Carlo (MCMC) simulation of the Bayesian analysis by running 18 separate analyses varying in length. We found convergence of the important components of the phylogenetic model (the tree with the maximum posterior probability, branch lengths, the parameter values from the DNA substitution model, and the posterior probabilities for clade support) for these data after one million generations of the MCMC simulations. In the process, we identified a run where the parameter values obtained were often outside the range of values obtained from the other runs, suggesting an aberrant result. In addition, we compared the Bayesian method of phylogenetic analysis to maximum likelihood and maximum parsimony. The results from the Bayesian analysis and the maximum likelihood analysis were similar for topology, branch lengths, and parameters of the DNA substitution model. Topologies also were similar in the comparison between the Bayesian analysis and maximum parsimony, although the posterior probabilities and the bootstrap proportions exhibited some striking differences. In a Bayesian analysis of three data sets (ITS sequences, waxy sequences, and ITS + waxy sequences) no supoort for the monophyly of the genus Ipomoea, or for the tribe Argyreieae, was observed, with the estimate of the probability of the monophyly of these taxa being less than 3.4 x 10(-7).  相似文献   

18.
To set the stage for historical analyses of the ecology and behavior of tree swallows and their allies (genus Tachycineta), we reconstructed the phylogeny of the nine Tachycineta species by comparing DNA sequences of six mitochondrial genes: Cytochrome b (990 base pairs), the second subunit of nicotinamide adenine dinucleotide dehydrogenase (839 base pairs), cytochrome oxidase II (85 base pairs), ATPase 8 (158 base pairs), tRNA-lysine (73 base pairs), and tRNA-methionine (25 base pairs). The phylogeny consisted of two main clades: South and Central American species ((T. stolzmanni, T. albilinea, T. albiventris), (T. leucorrhoa, T. meyeni)), and North American and Caribbean species (T. bicolor, (T. thalassina, T. euchrysea, T. cyaneoviridis)). The genetic distances among the species suggested that Tachycineta is a relatively old group compared to other New World swallow genera. One interesting biogeographic discovery was the close relationship between Caribbean and western North American taxa. This historical connection occurs in other groups of swallows and swifts as well. To reconstruct the phylogeny, we employed Bayesian as well as traditional maximum-likelihood methods. The Bayesian approach provided probability values for trees produced from the different genes and gene combinations, as well as probabilities of branches within those trees. We compared Bayesian and maximum-likelihood bootstrap branch support and found that all branches with Bayesian probabilities > or = 95% received bootstrap support >70%.  相似文献   

19.
Many empirical studies have revealed considerable differences between nonparametric bootstrapping and Bayesian posterior probabilities in terms of the support values for branches, despite claimed predictions about their approximate equivalence. We investigated this problem by simulating data, which were then analyzed by maximum likelihood bootstrapping and Bayesian phylogenetic analysis using identical models and reoptimization of parameter values. We show that Bayesian posterior probabilities are significantly higher than corresponding nonparametric bootstrap frequencies for true clades, but also that erroneous conclusions will be made more often. These errors are strongly accentuated when the models used for analyses are underparameterized. When data are analyzed under the correct model, nonparametric bootstrapping is conservative. Bayesian posterior probabilities are also conservative in this respect, but less so.  相似文献   

20.
刘涛  李晓贤 《广西植物》2010,30(6):796-804
应用最大似然法(ML)、贝叶斯推论(BI)、邻接法(NJ)和似然比检验(hLRTs)进行泽泻目分子系统学研究。所用的rbcL基因序列代表了泽泻目14科46属以及作为外类群的6相关属。研究结果表明,*等级制似然比检验表明泽泻目rbcL序列最适合的DNA进化模型为GTR+I+G,最大似然法、贝叶斯法和邻接法构建的系统发育树拓扑结构相似,没有显著的差异,但贝叶斯树支持率较高;泽泻目为一单系类群,由两个主要谱系分支构成,深层分布格局由5个主要分支构成。基于分子系统发育树,文中对泽泻目科间、水鳖科+茨藻科、泽泻科+花蔺科+黄花蔺科、和"Cymodoeaceae complex"的系统发育关系进行了讨论。研究结果还表明,泽泻目系统发育关系可能还需要更多的证据进一步的澄清。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号