首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Most phylogeographic studies have used maximum likelihood or maximum parsimony to infer phylogeny and bootstrap analysis to evaluate support for trees. Recently, Bayesian methods using Marlov chain Monte Carlo to search tree space and simultaneously estimate tree support have become popular due to its fast search speed and ability to create a posterior distribution of parameters of interest. Here, I present a study that utilizes Bayesian methods to infer phylogenetic relationships of the cornsnake (Elaphe guttata) complex using cytochrome b sequences. Examination of the posterior probability distributions confirms the existence of three geographic lineages. Additionally, there is no support for the monophyly of the subspecies of E. guttata. Results suggest the three geographic lineages partially conform to the ranges of previously defined subspecies, although Shimodaira-Hasegawa tests suggest that subspecies-constrained trees produce significantly poorer likelihood estimates than the most likely trees reflecting the evolution of three geographic assemblages. Based on molecular support, these three geographic assemblages are recognized as species using evolutionary species criteria: E. guttata, Elaphe slowinskii, and Elaphe emoryi [phylogeographic, maximum likelihood, maximum parsimony, bootstrap, Bayesian, Markov chain Monte Carlo, cornsnake, Cytochrome b, geographic lineages, E. guttta, E. slowinskii, and E. emoryi].  相似文献   

2.
An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data. The birth-death process with species sampling is used to specify the prior distribution of phylogenies and ancestral speciation times, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree. Monte Carlo integration is used to integrate over the ancestral speciation times for particular trees. A Markov Chain Monte Carlo method is used to generate the set of trees with the highest posterior probabilities. Methods are described for an empirical Bayesian analysis, in which estimates of the speciation and extinction rates are used in calculating the posterior probabilities, and a hierarchical Bayesian analysis, in which these parameters are removed from the model by an additional integration. The Markov Chain Monte Carlo method avoids the requirement of our earlier method for calculating MAP trees to sum over all possible topologies (which limited the number of taxa in an analysis to about five). The methods are applied to analyze DNA sequences for nine species of primates, and the MAP tree, which is identical to a maximum-likelihood estimate of topology, has a probability of approximately 95%.   相似文献   

3.
We modified the phylogenetic program MrBayes 3.1.2 to incorporate the compound Dirichlet priors for branch lengths proposed recently by Rannala, Zhu, and Yang (2012. Tail paradox, partial identifiability and influential priors in Bayesian branch length inference. Mol. Biol. Evol. 29:325-335.) as a solution to the problem of branch-length overestimation in Bayesian phylogenetic inference. The compound Dirichlet prior specifies a fairly diffuse prior on the tree length (the sum of branch lengths) and uses a Dirichlet distribution to partition the tree length into branch lengths. Six problematic data sets originally analyzed by Brown, Hedtke, Lemmon, and Lemmon (2010. When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Syst. Biol. 59:145-161) are reanalyzed using the modified version of MrBayes to investigate properties of Bayesian branch-length estimation using the new priors. While the default exponential priors for branch lengths produced extremely long trees, the compound Dirichlet priors produced posterior estimates that are much closer to the maximum likelihood estimates. Furthermore, the posterior tree lengths were quite robust to changes in the parameter values in the compound Dirichlet priors, for example, when the prior mean of tree length changed over several orders of magnitude. Our results suggest that the compound Dirichlet priors may be useful for correcting branch-length overestimation in phylogenetic analyses of empirical data sets.  相似文献   

4.
We tested whether it is beneficial for the accuracy of phylogenetic inference to sample characters that are evolving under different sets of parameters, using both Bayesian MCMC (Markov chain Monte Carlo) and parsimony approaches. We examined differential rates of evolution among characters, differential character-state frequencies and character-state space, and differential relative branch lengths among characters. We also compared the relative performance of parsimony and Bayesian analyses by progressively incorporating more of these heterogeneous parameters and progressively increasing the severity of this heterogeneity. Bayesian analyses performed better than parsimony when heterogeneous simulation parameters were incorporated into the substitution model. However, parsimony outperformed Bayesian MCMC when heterogeneous simulation parameters were not incorporated into the Bayesian substitution model. The higher the rate of evolution simulated, the better parsimony performed relative to Bayesian analyses. Bayesian and parsimony analyses converged in their performance as the number of simulated heterogeneous model parameters increased. Up to a point, rate heterogeneity among sites was generally advantageous for phylogenetic inference using both approaches. In contrast, branch-length heterogeneity was generally disadvantageous for phylogenetic inference using both parsimony and Bayesian approaches. Parsimony was found to be more conservative than Bayesian analyses, in that it resolved fewer incorrect clades.
© The Willi Hennig Society 2006.  相似文献   

5.
The objective of this study was to obtain a quantitative assessment of the monophyly of morning glory taxa, specifically the genus Ipomoea and the tribe Argyreieae. Previous systematic studies of morning glories intimated the paraphyly of Ipomoea by suggesting that the genera within the tribe Argyreieae are derived from within Ipomoea; however, no quantitative estimates of statistical support were developed to address these questions. We applied a Bayesian analysis to provide quantitative estimates of monophyly in an investigation of morning glory relationships using DNA sequence data. We also explored various approaches for examining convergence of the Markov chain Monte Carlo (MCMC) simulation of the Bayesian analysis by running 18 separate analyses varying in length. We found convergence of the important components of the phylogenetic model (the tree with the maximum posterior probability, branch lengths, the parameter values from the DNA substitution model, and the posterior probabilities for clade support) for these data after one million generations of the MCMC simulations. In the process, we identified a run where the parameter values obtained were often outside the range of values obtained from the other runs, suggesting an aberrant result. In addition, we compared the Bayesian method of phylogenetic analysis to maximum likelihood and maximum parsimony. The results from the Bayesian analysis and the maximum likelihood analysis were similar for topology, branch lengths, and parameters of the DNA substitution model. Topologies also were similar in the comparison between the Bayesian analysis and maximum parsimony, although the posterior probabilities and the bootstrap proportions exhibited some striking differences. In a Bayesian analysis of three data sets (ITS sequences, waxy sequences, and ITS + waxy sequences) no supoort for the monophyly of the genus Ipomoea, or for the tribe Argyreieae, was observed, with the estimate of the probability of the monophyly of these taxa being less than 3.4 x 10(-7).  相似文献   

6.
In popular use of Bayesian phylogenetics, a default branch-length prior is almost universally applied without knowing how a different prior would have affected the outcome. We performed Bayesian and maximum likelihood (ML) inference of phylogeny based on empirical nucleotide sequence data from a family of lichenized ascomycetes, the Psoraceae, the morphological delimitation of which has been controversial. We specifically assessed the influence of the combination of Bayesian branch-length prior and likelihood model on the properties of the Markov chain Monte Carlo tree sample, including node support, branch lengths, and taxon stability. Data included two regions of the mitochondrial ribosomal RNA gene, the internal transcribed spacer region of the nuclear ribosomal RNA gene, and the protein-coding largest subunit of RNA polymerase II. Data partitioning was performed using Bayes' factors, whereas the best-fitting model of each partition was selected using the Bayesian information criterion (BIC). Given the data and model, short Bayesian branch-length priors generate higher numbers of strongly supported nodes as well as short and topologically similar trees sampled from parts of tree space that are largely unexplored by the ML bootstrap. Long branch-length priors generate fewer strongly supported nodes and longer and more dissimilar trees that are sampled mostly from inside the range of tree space sampled by the ML bootstrap. Priors near the ML distribution of branch lengths generate the best marginal likelihood and the highest frequency of "rogue" (unstable) taxa. The branch-length prior was shown to interact with the likelihood model. Trees inferred under complex partitioned models are more affected by the stretching effect of the branch-length prior. Fewer nodes are strongly supported under a complex model given the same branch-length prior. Irrespective of model, internal branches make up a larger proportion of total tree length under the shortest branch-length priors compared with longer priors. Relative effects on branch lengths caused by the branch-length prior can be problematic to downstream phylogenetic comparative methods making use of the branch lengths. Furthermore, given the same branch-length prior, trees are on average more dissimilar under a simple unpartitioned model compared with a more complex partitioned models. The distribution of ML branch lengths was shown to better fit a gamma or Pareto distribution than an exponential one. Model adequacy tests indicate that the best-fitting model selected by the BIC is insufficient for describing data patterns in 5 of 8 partitions. More general substitution models are required to explain the data in three of these partitions, one of which also requires nonstationarity. The two mitochondrial ribosomal RNA gene partitions need heterotachous models. We found no significant correlations between, on the one hand, the amount of ambiguous data or the smallest branch-length distance to another taxon and, on the other hand, the topological stability of individual taxa. Integrating over several exponentially distributed means under the best-fitting model, node support for the family Psoraceae, including Psora, Protoblastenia, and the Micarea sylvicola group, is approximately 0.96. Support for the genus Psora is distinctly lower, but we found no evidence to contradict the current classification.  相似文献   

7.
The generalized Gibbs sampler (GGS) is a recently developed Markov chain Monte Carlo (MCMC) technique that enables Gibbs-like sampling of state spaces that lack a convenient representation in terms of a fixed coordinate system. This paper describes a new sampler, called the tree sampler, which uses the GGS to sample from a state space consisting of phylogenetic trees. The tree sampler is useful for a wide range of phylogenetic applications, including Bayesian, maximum likelihood, and maximum parsimony methods. A fast new algorithm to search for a maximum parsimony phylogeny is presented, using the tree sampler in the context of simulated annealing. The mathematics underlying the algorithm is explained and its time complexity is analyzed. The method is tested on two large data sets consisting of 123 sequences and 500 sequences, respectively. The new algorithm is shown to compare very favorably in terms of speed and accuracy to the program DNAPARS from the PHYLIP package.  相似文献   

8.
Modeling compositional heterogeneity   总被引:12,自引:0,他引:12  
Compositional heterogeneity among lineages can compromise phylogenetic analyses, because models in common use assume compositionally homogeneous data. Models that can accommodate compositional heterogeneity with few extra parameters are described here, and used in two examples where the true tree is known with confidence. It is shown using likelihood ratio tests that adequate modeling of compositional heterogeneity can be achieved with few composition parameters, that the data may not need to be modelled with separate composition parameters for each branch in the tree. Tree searching and placement of composition vectors on the tree are done in a Bayesian framework using Markov chain Monte Carlo (MCMC) methods. Assessment of fit of the model to the data is made in both maximum likelihood (ML) and Bayesian frameworks. In an ML framework, overall model fit is assessed using the Goldman-Cox test, and the fit of the composition implied by a (possibly heterogeneous) model to the composition of the data is assessed using a novel tree-and model-based composition fit test. In a Bayesian framework, overall model fit and composition fit are assessed using posterior predictive simulation. It is shown that when composition is not accommodated, then the model does not fit, and incorrect trees are found; but when composition is accommodated, the model then fits, and the known correct phylogenies are obtained.  相似文献   

9.
The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon--known as heterotachy--can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our 'pattern-heterogeneity' mixture model, applying it to simulated data and five published datasets. We find that complex evolutionary signals of heterotachy are routinely present over and above variation in the rate or pattern of evolution across sites, that the reversible-jump method requires far fewer parameters than conventional mixture models to describe it, and serves to identify the regions of the tree in which heterotachy is most pronounced. The reversible-jump procedure also removes the need for a posteriori tests of 'significance' such as the Akaike or Bayesian information criterion tests, or Bayes factors. Heterotachy has important consequences for the correct reconstruction of phylogenies as well as for tests of hypotheses that rely on accurate branch-length information. These include molecular clocks, analyses of tempo and mode of evolution, comparative studies and ancestral state reconstruction. The model is available from the authors' website, and can be used for the analysis of both nucleotide and morphological data.  相似文献   

10.
Recent studies have observed that Bayesian analyses of sequence data sets using the program MrBayes sometimes generate extremely large branch lengths, with posterior credibility intervals for the tree length (sum of branch lengths) excluding the maximum likelihood estimates. Suggested explanations for this phenomenon include the existence of multiple local peaks in the posterior, lack of convergence of the chain in the tail of the posterior, mixing problems, and misspecified priors on branch lengths. Here, we analyze the behavior of Bayesian Markov chain Monte Carlo algorithms when the chain is in the tail of the posterior distribution and note that all these phenomena can occur. In Bayesian phylogenetics, the likelihood function approaches a constant instead of zero when the branch lengths increase to infinity. The flat tail of the likelihood can cause poor mixing and undue influence of the prior. We suggest that the main cause of the extreme branch length estimates produced in many Bayesian analyses is the poor choice of a default prior on branch lengths in current Bayesian phylogenetic programs. The default prior in MrBayes assigns independent and identical distributions to branch lengths, imposing strong (and unreasonable) assumptions about the tree length. The problem is exacerbated by the strong correlation between the branch lengths and parameters in models of variable rates among sites or among site partitions. To resolve the problem, we suggest two multivariate priors for the branch lengths (called compound Dirichlet priors) that are fairly diffuse and demonstrate their utility in the special case of branch length estimation on a star phylogeny. Our analysis highlights the need for careful thought in the specification of high-dimensional priors in Bayesian analyses.  相似文献   

11.
距离矩阵邻接法、最大简约法和最大似然法是重建生物系统关系的3种主要方法。普遍认为最大似然法在原理上优于前二种方法,但其计算复杂费时。由于现行计算机的能力尚达不到其要求而实用性差,特别是在处理大数据集样本(即大于25个分类单元)时,用此方法几乎不可能。新近提出的贝叶斯法(Bayesianmethod)既保留了最大似然法的基本原理,又引进了马尔科夫链的蒙特卡洛方法,并使计算时间大大缩短。本文用贝叶斯法对硬蜱属(Ixodes)19个种的线粒体16S rDNA片段进行了系统进化分析。从总体上看,分析结果与现有的基于形态学的分类体系基本吻合。但与现存的假说相反,莱姆病的主要宿主蓖籽硬蜱复合种组并非单系。通过比较贝叶斯法与其它三种方法的结果,我们认为贝叶斯法是一种系统进化分析的好方法,它既能根据分子进化的现有理论和各种模型用概率重建系统进化关系,又克服了最大似然法计算速度慢、不适用于大数据集样本的缺陷。贝叶斯法根据后验概率直观地表示系统进化关系的分析结果,不需要用自引导法进行检验。可以预料,贝叶斯法将会被广泛地应用到系统进化分析上[动物学报49(3):380—388,2003]。  相似文献   

12.
We study the phylogeny of the placental mammals using molecular data from all mitochondrial tRNAs and rRNAs of 54 species. We use probabilistic substitution models specific to evolution in base paired regions of RNA. A number of these models have been implemented in a new phylogenetic inference software package for carrying out maximum likelihood and Bayesian phylogenetic inferences. We describe our Bayesian phylogenetic method which uses a Markov chain Monte Carlo algorithm to provide samples from the posterior distribution of tree topologies. Our results show support for four primary mammalian clades, in agreement with recent studies of much larger data sets mainly comprising nuclear DNA. We discuss some issues arising when using Bayesian techniques on RNA sequence data.  相似文献   

13.
What does the posterior probability of a phylogenetic tree mean?This simulation study shows that Bayesian posterior probabilities have the meaning that is typically ascribed to them; the posterior probability of a tree is the probability that the tree is correct, assuming that the model is correct. At the same time, the Bayesian method can be sensitive to model misspecification, and the sensitivity of the Bayesian method appears to be greater than the sensitivity of the nonparametric bootstrap method (using maximum likelihood to estimate trees). Although the estimates of phylogeny obtained by use of the method of maximum likelihood or the Bayesian method are likely to be similar, the assessment of the uncertainty of inferred trees via either bootstrapping (for maximum likelihood estimates) or posterior probabilities (for Bayesian estimates) is not likely to be the same. We suggest that the Bayesian method be implemented with the most complex models of those currently available, as this should reduce the chance that the method will concentrate too much probability on too few trees.  相似文献   

14.
Bayesian Markov chain Monte Carlo sampling has become increasingly popular in phylogenetics as a method for both estimating the maximum likelihood topology and for assessing nodal confidence. Despite the growing use of posterior probabilities, the relationship between the Bayesian measure of confidence and the most commonly used confidence measure in phylogenetics, the nonparametric bootstrap proportion, is poorly understood. We used computer simulation to investigate the behavior of three phylogenetic confidence methods: Bayesian posterior probabilities calculated via Markov chain Monte Carlo sampling (BMCMC-PP), maximum likelihood bootstrap proportion (ML-BP), and maximum parsimony bootstrap proportion (MP-BP). We simulated the evolution of DNA sequence on 17-taxon topologies under 18 evolutionary scenarios and examined the performance of these methods in assigning confidence to correct monophyletic and incorrect monophyletic groups, and we examined the effects of increasing character number on support value. BMCMC-PP and ML-BP were often strongly correlated with one another but could provide substantially different estimates of support on short internodes. In contrast, BMCMC-PP correlated poorly with MP-BP across most of the simulation conditions that we examined. For a given threshold value, more correct monophyletic groups were supported by BMCMC-PP than by either ML-BP or MP-BP. When threshold values were chosen that fixed the rate of accepting incorrect monophyletic relationship as true at 5%, all three methods recovered most of the correct relationships on the simulated topologies, although BMCMC-PP and ML-BP performed better than MP-BP. BMCMC-PP was usually a less biased predictor of phylogenetic accuracy than either bootstrapping method. BMCMC-PP provided high support values for correct topological bipartitions with fewer characters than was needed for nonparametric bootstrap.  相似文献   

15.
Abstract.— The importance of accommodating the phylogenetic history of a group when performing a comparative analysis is now widely recognized. The typical approaches either assume the tree is known without error, or they base inferences on a collection of well-supported trees or on a collection of trees generated under a stochastic model of cladogenesis. However, these approaches do not adequately account for the uncertainty of phylogenetic trees in a comparative analysis, especially when data relevant to the phylogeny of a group are available. Here, we develop a method for performing comparative analyses that is based on an extension of Felsenstein's independent contrasts method. Uncertainties in the phylogeny, branch lengths, and other parameters are accommodated by averaging over all possible trees, weighting each by the probability that the tree is correct. We do this in a Bayesian framework and use Markov chain Monte Carlo to perform the high-dimensional summations and integrations required by the analysis. We illustrate the method using comparative characters sampled from Anolis lizards.  相似文献   

16.
A method for computing the likelihood of a set of sequences assuming a phylogenetic network as an evolutionary hypothesis is presented. The approach applies directed graphical models to sequence evolution on networks and is a natural generalization of earlier work by Felsenstein on evolutionary trees, including it as a special case. The likelihood computation involves several steps. First, the phylogenetic network is rooted to form a directed acyclic graph (DAG). Then, applying standard models for nucleotide/amino acid substitution, the DAG is converted into a Bayesian network from which the joint probability distribution involving all nodes of the network can be directly read. The joint probability is explicitly dependent on branch lengths and on recombination parameters (prior probability of a parent sequence). The likelihood of the data assuming no knowledge of hidden nodes is obtained by marginalization, i.e., by summing over all combinations of unknown states. As the number of terms increases exponentially with the number of hidden nodes, a Markov chain Monte Carlo procedure (Gibbs sampling) is used to accurately approximate the likelihood by summing over the most important states only. Investigating a human T-cell lymphotropic virus (HTLV) data set and optimizing both branch lengths and recombination parameters, we find that the likelihood of a corresponding phylogenetic network outperforms a set of competing evolutionary trees. In general, except for the case of a tree, the likelihood of a network will be dependent on the choice of the root, even if a reversible model of substitution is applied. Thus, the method also provides a way in which to root a phylogenetic network by choosing a node that produces a most likely network.  相似文献   

17.
We propose a Bayesian method for testing molecular clock hypotheses for use with aligned sequence data from multiple taxa. Our method utilizes a nonreversible nucleotide substitution model to avoid the necessity of specifying either a known tree relating the taxa or an outgroup for rooting the tree. We employ reversible jump Markov chain Monte Carlo to sample from the posterior distribution of the phylogenetic model parameters and conduct hypothesis testing using Bayes factors, the ratio of the posterior to prior odds of competing models. Here, the Bayes factors reflect the relative support of the sequence data for equal rates of evolutionary change between taxa versus unequal rates, averaged over all possible phylogenetic parameters, including the tree and root position. As the molecular clock model is a restriction of the more general unequal rates model, we use the Savage-Dickey ratio to estimate the Bayes factors. The Savage-Dickey ratio provides a convenient approach to calculating Bayes factors in favor of sharp hypotheses. Critical to calculating the Savage-Dickey ratio is a determination of the prior induced on the modeling restrictions. We demonstrate our method on a well-studied mtDNA sequence data set consisting of nine primates. We find strong support against a global molecular clock, but do find support for a local clock among the anthropoids. We provide mathematical derivations of the induced priors on branch length restrictions assuming equally likely trees. These derivations also have more general applicability to the examination of prior assumptions in Bayesian phylogenetics.  相似文献   

18.
Comparison of the performance and accuracy of different inference methods, such as maximum likelihood (ML) and Bayesian inference, is difficult because the inference methods are implemented in different programs, often written by different authors. Both methods were implemented in the program MIGRATE, that estimates population genetic parameters, such as population sizes and migration rates, using coalescence theory. Both inference methods use the same Markov chain Monte Carlo algorithm and differ from each other in only two aspects: parameter proposal distribution and maximization of the likelihood function. Using simulated datasets, the Bayesian method generally fares better than the ML approach in accuracy and coverage, although for some values the two approaches are equal in performance. MOTIVATION: The Markov chain Monte Carlo-based ML framework can fail on sparse data and can deliver non-conservative support intervals. A Bayesian framework with appropriate prior distribution is able to remedy some of these problems. RESULTS: The program MIGRATE was extended to allow not only for ML(-) maximum likelihood estimation of population genetics parameters but also for using a Bayesian framework. Comparisons between the Bayesian approach and the ML approach are facilitated because both modes estimate the same parameters under the same population model and assumptions.  相似文献   

19.
Abstract. Phylogenetic relationships among tribes in the tachinid subfamily Exoristinae (Diptera, Tachinidae) are inferred from four genes, namely white, 18S, 28S and 16S rDNA. For phylogenetic inferences, maximum parsimony, maximum likelihood and Bayesian Markov chain Monte Carlo analyses were performed. The resultant, very similar, trees are nearly concordant with the traditional classification based on morphological characters. Our results suggest that the Tachinidae are monophyletic and sister to the Sarcophagidae. The tribal relationships within Exoristinae are supported in part with high reliabilities and are similar to those inferred by Stireman. Based on the resultant trees, the phylogenetic relationships and possible morphological synapomorphies were investigated. In addition, we evaluated the transformation of female reproductive habits in the Exoristinae, finding support for the hypothesis that ovolarviparity evolved independently from oviparity in several clades, and obtaining different results concerning the evolutionary history of micro‐ovolarviparity depending on character optimization.  相似文献   

20.
One of the most time‐consuming aspects of Bayesian posterior probability analysis in the analysis of phylogenetic trees is the use of Metropolis‐coupled Markov chain Monte Carlo (MC3) methods to determine relative posteriors and identify maximum a posteriori (MAP) trees. Here, analytical and numerical methods are presented to determine tree likelihoods that are integrated over edge‐length (and other parameter) distributions. Given topological (tree) priors (flat or otherwise), this allows for identification of the maximum posterior probability assignment (MAP‐A) of character states to non‐leaf tree vertices via dynamic programming. Using this form of posterior probability as an optimality criterion, tree space can be searched using standard trajectory techniques and heuristically optimal MAP‐A trees can be identified with considerable time savings over MC3. Example cases are presented using aligned and unaligned molecular sequences as well as combined molecular and anatomical data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号