首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Polytomies and Bayesian phylogenetic inference   总被引:16,自引:0,他引:16  
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short branch lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.  相似文献   

2.
The Bayesian method of phylogenetic inference often produces high posterior probabilities (PPs) for trees or clades, even when the trees are clearly incorrect. The problem appears to be mainly due to large sizes of molecular datasets and to the large-sample properties of Bayesian model selection and its sensitivity to the prior when several of the models under comparison are nearly equally correct (or nearly equally wrong) and are of the same dimension. A previous suggestion to alleviate the problem is to let the internal branch lengths in the tree become increasingly small in the prior with the increase in the data size so that the bifurcating trees are increasingly star-like. In particular, if the internal branch lengths are assigned the exponential prior, the prior mean mu0 should approach zero faster than 1/square root n but more slowly than 1/n, where n is the sequence length. This paper examines the usefulness of this data size-dependent prior using a dataset of the mitochondrial protein-coding genes from the baleen whales, with the prior mean fixed at mu0=0.1n(-2/3). In this dataset, phylogeny reconstruction is sensitive to the assumed evolutionary model, species sampling and the type of data (DNA or protein sequences), but Bayesian inference using the default prior attaches high PPs for conflicting phylogenetic relationships. The data size-dependent prior alleviates the problem to some extent, giving weaker support for unstable relationships. This prior may be useful in reducing apparent conflicts in the results of Bayesian analysis or in making the method less sensitive to model violations.  相似文献   

3.
Understanding mammalian evolution using Bayesian phylogenetic inference   总被引:1,自引:0,他引:1  
1. Phylogenetic trees are critical in addressing evolutionary hypotheses; however, the reconstruction of a phylogeny is no easy task. This process has recently been made less arduous by using a Bayesian statistical approach. This method offers the advantage that one can determine the probability of some hypothesis (i.e. a phylogeny), conditional on the observed data (i.e. nucleotide sequences). 2. By reconstructing phylogenies using Bayes’ theorem in combination with Markov chain Monte Carlo, phylogeneticists are able to test hypotheses more quickly compared with using standard methods such as neighbour-joining, maximum likelihood or parsimony. Critics of the Bayesian approach suggest that it is not a panacea, and argue that the prior probability is too subjective and the resulting posterior probability is too liberal compared with maximum likelihood. 3. These issues are currently debated in the arena of mammalian evolution. Recently, proponents and opponents of the Bayesian approach have constructed the mammalian phylogeny using different methods under different conditions and with a variety of parameters. These analyses showed the robustness (or lack of) of the Bayesian approach. In the end, consensus suggests that Bayesian methods are robust and give essentially the same answer as maximum likelihood methods but in less time. 4. Approaches based on fossils and molecules typically agree on ordinal-level relationships among mammals but not on higher-level relationships, as Bayesian analyses recognize the African radiation, Afrotheria, and the two Laurasian radiations, Laurasiatheria and Euarchontoglires, whereas fossils did not predict Afrotheria.  相似文献   

4.
MRBAYES: Bayesian inference of phylogenetic trees   总被引:108,自引:0,他引:108  
SUMMARY: The program MRBAYES performs Bayesian inference of phylogeny using a variant of Markov chain Monte Carlo. AVAILABILITY: MRBAYES, including the source code, documentation, sample data files, and an executable, is available at http://brahms.biology.rochester.edu/software.html.  相似文献   

5.
Wheeler WC and Pickett KM (2008. Topology-Bayes versus clade-Bayesin phylogenetic analysis. Mol Biol Evol. 25:447–453.)discuss two ways of summarizing the posterior probability distributionof a Bayesian phylogenetic analysis, which they refer to as"topology-Bayes" and "clade-Bayes." They claim that the clade-Bayesapproach leads to problems such as "exaggerated clade support,inconsistently biased priors, and the impossibility of topologyhypothesis testing," which are not problems for the topology-Bayesapproach. However, their argument for topology-Bayes over clade-Bayesis based on errors in the interpretation of summary statisticsassociated with Bayesian phylogenetic analysis. Although thereis a well-documented difference between the maximum posteriorprobability topology and the majority-rule consensus topology(the established terms for topology-Bayes and clade-Bayes summaries,respectively), both have a place in phylogenetic analysis. Choiceof summarization strategy should be driven by choice of parametersthat need to be estimated versus those to be marginalized giventhe evolutionary questions being asked or hypotheses being tested.  相似文献   

6.
MOTIVATION: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. Metropolis coupled MCMC [(MC)(3)], a variant of MCMC, allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. RESULTS: This paper presents a parallel algorithm for (MC)(3). The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets.  相似文献   

7.
MrBayes 3: Bayesian phylogenetic inference under mixed models   总被引:150,自引:0,他引:150  
MrBayes 3 performs Bayesian phylogenetic analysis combining information from different data partitions or subsets evolving under different stochastic evolutionary models. This allows the user to analyze heterogeneous data sets consisting of different data types-e.g. morphological, nucleotide, and protein-and to explore a wide variety of structured models mixing partition-unique and shared parameters. The program employs MPI to parallelize Metropolis coupling on Macintosh or UNIX clusters.  相似文献   

8.
Bayesian phylogenetic inference via Markov chain Monte Carlo methods   总被引:27,自引:0,他引:27  
Mau B  Newton MA  Larget B 《Biometrics》1999,55(1):1-12
We derive a Markov chain to sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms, a stochastic model for these data, and a prior distribution on the space of trees. A transformation of the tree into a canonical cophenetic matrix form suggests a simple and effective proposal distribution for selecting candidate trees close to the current tree in the chain. We illustrate the algorithm with restriction site data on 9 plant species, then extend to DNA sequences from 32 species of fish. The algorithm mixes well in both examples from random starting trees, generating reproducible estimates and credible sets for the path of evolution.  相似文献   

9.

Background  

Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors.  相似文献   

10.
Nonhomogeneous substitution models have been introduced for phylogenetic inference when the substitution process is nonstationary, for example, when sequence composition differs between lineages. Existing models can have many parameters, and it is then difficult and computationally expensive to learn the parameters and to select the optimal model complexity. We extend an existing nonhomogeneous substitution model by introducing a reversible jump Markov chain Monte Carlo method for efficient Bayesian inference of the model order along with other phylogenetic parameters of interest. We also introduce a new hierarchical prior which leads to more reasonable results when only a small number of lineages share a particular substitution process. The method is implemented in the PHASE software, which includes specialized substitution models for RNA genes with conserved secondary structure. We apply an RNA-specific nonhomogeneous model to a structure-based alignment of rRNA sequences spanning the entire tree of life. A previous study of the same genes from a similar set of species found robust evidence for a mesophilic last universal common ancestor (LUCA) by inference of the G+C composition at the root of the tree. In the present study, we find that the helical GC composition at the root is strongly dependent on the root position. With a bacterial rooting, we find that there is no longer strong support for either a mesophile or a thermophile LUCA, although a hyperthermophile LUCA remains unlikely. We discuss reasons why results using only RNA helices may differ from results using all aligned sites when applying nonhomogeneous models to RNA genes.  相似文献   

11.
The main limiting factor in Bayesian MCMC analysis of phylogeny is typically the efficiency with which topology proposals sample tree space. Here we evaluate the performance of seven different proposal mechanisms, including most of those used in current Bayesian phylogenetics software. We sampled 12 empirical nucleotide data sets--ranging in size from 27 to 71 taxa and from 378 to 2,520 sites--under difficult conditions: short runs, no Metropolis-coupling, and an oversimplified substitution model producing difficult tree spaces (Jukes Cantor with equal site rates). Convergence was assessed by comparison to reference samples obtained from multiple Metropolis-coupled runs. We find that proposals producing topology changes as a side effect of branch length changes (LOCAL and Continuous Change) consistently perform worse than those involving stochastic branch rearrangements (nearest neighbor interchange, subtree pruning and regrafting, tree bisection and reconnection, or subtree swapping). Among the latter, moves that use an extension mechanism to mix local with more distant rearrangements show better overall performance than those involving only local or only random rearrangements. Moves with only local rearrangements tend to mix well but have long burn-in periods, whereas moves with random rearrangements often show the reverse pattern. Combinations of moves tend to perform better than single moves. The time to convergence can be shortened considerably by starting with a good tree, but this comes at the cost of compromising convergence diagnostics based on overdispersed starting points. Our results have important implications for developers of Bayesian MCMC implementations and for the large group of users of Bayesian phylogenetics software.  相似文献   

12.
S-Adenosylhomocysteine hydrolase (SahH) is involved in the degradation of the compound which inhibits methylation reactions. Using a Bayesian approach and other methods, we reconstructed a phylogenetic tree of amino acid sequences of this protein originating from all three major domains of living organisms. The SahH sequences formed two major branches: one composed mainly of Archaea and the other of eukaryotes and majority of bacteria, clearly contradicting the three-domain topology shown by small subunit rRNA gene. This topology suggests the occurrence of lateral transfer of this gene between the domains. Poor resolution of eukaryotes and bacteria excluded an ultimate conclusion in which out of the two domains this gene appeared first, however, the congruence of the secondary branches with SS rRNA and/or concatenated ribosomal protein datasets phylogenies suggested an "early" acquisition by some bacterial and eukaryotic phyla. Similarly, the branching pattern of Archaea reflected the phylogenies shown by SS rRNA and ribosomal proteins. SahH is widespread in Eucarya, albeit, due to reductive evolution, it is missing in the intracellular parasite Encephalitozoon cuniculi. On the other hand, the lack of affinity to the sequences from the alpha-Proteobacteria and cyanobacteria excludes a possibility of its acquisition in the course of mitochondrial or chloroplast endosymbioses. Unlike Archaea, most bacteria carry MTA/SAH nucleosidase, an enzyme involved also in metabolism of methylthioadenosine. However, the double function of MTA/SAH nucleosidase may be a barrier to ensure the efficient degradation of S-adenosylhomocysteine, specially when the intensity of methylation processes is high. This would explain the presence of S-adenosylhomocysteine hydrolase in the bacteria that have more complex metabolism. On the other hand, majority of obligate pathogenic bacteria due to simpler metabolism rely entirely on MTA/SAH nucleosidase. This could explain the observed phenetic pattern in which bacteria with larger (>6 Mb-million base pairs) genomes carry SAH hydrolase, whereas bacteria that have undergone reductive evolution usually carry MTA/SAH nucleosidase. This suggests that the presence or acquisition of S-adenosylhomocysteine hydrolase in bacteria may predispose towards higher metabolic, and in consequence, higher genomic complexity. The good examples are the phototrophic bacteria all of which carry this gene, however, the SahH phylogeny shows lack of congruence with SSU rRNA and photosyntethic genes, implying that the acquisition was independent and presumably preceded the acquisition of photosyntethic genes. The majority of cyanobacteria acquired this gene from Archaea, however, in some species the sahH gene was replaced by a copy from the beta- or gamma-Proteobacteria.  相似文献   

13.
For a model of molecular evolution to be useful for phylogenetic inference, the topology of evolutionary trees must be identifiable. That is, from a joint distribution the model predicts, it must be possible to recover the tree parameter. We establish tree identifiability for a number of phylogenetic models, including a covarion model and a variety of mixture models with a limited number of classes. The proof is based on the introduction of a more general model, allowing more states at internal nodes of the tree than at leaves, and the study of the algebraic variety formed by the joint distributions to which it gives rise. Tree identifiability is first established for this general model through the use of certain phylogenetic invariants.  相似文献   

14.
SUMMARY: MAC5 implements MCMC sampling of the posterior distribution of tree topologies from DNA sequences containing gaps by using a five state model of evolution (the four nucleotides and the gap character).  相似文献   

15.
The objective of this study was to obtain a quantitative assessment of the monophyly of morning glory taxa, specifically the genus Ipomoea and the tribe Argyreieae. Previous systematic studies of morning glories intimated the paraphyly of Ipomoea by suggesting that the genera within the tribe Argyreieae are derived from within Ipomoea; however, no quantitative estimates of statistical support were developed to address these questions. We applied a Bayesian analysis to provide quantitative estimates of monophyly in an investigation of morning glory relationships using DNA sequence data. We also explored various approaches for examining convergence of the Markov chain Monte Carlo (MCMC) simulation of the Bayesian analysis by running 18 separate analyses varying in length. We found convergence of the important components of the phylogenetic model (the tree with the maximum posterior probability, branch lengths, the parameter values from the DNA substitution model, and the posterior probabilities for clade support) for these data after one million generations of the MCMC simulations. In the process, we identified a run where the parameter values obtained were often outside the range of values obtained from the other runs, suggesting an aberrant result. In addition, we compared the Bayesian method of phylogenetic analysis to maximum likelihood and maximum parsimony. The results from the Bayesian analysis and the maximum likelihood analysis were similar for topology, branch lengths, and parameters of the DNA substitution model. Topologies also were similar in the comparison between the Bayesian analysis and maximum parsimony, although the posterior probabilities and the bootstrap proportions exhibited some striking differences. In a Bayesian analysis of three data sets (ITS sequences, waxy sequences, and ITS + waxy sequences) no supoort for the monophyly of the genus Ipomoea, or for the tribe Argyreieae, was observed, with the estimate of the probability of the monophyly of these taxa being less than 3.4 x 10(-7).  相似文献   

16.
An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data. The birth-death process with species sampling is used to specify the prior distribution of phylogenies and ancestral speciation times, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree. Monte Carlo integration is used to integrate over the ancestral speciation times for particular trees. A Markov Chain Monte Carlo method is used to generate the set of trees with the highest posterior probabilities. Methods are described for an empirical Bayesian analysis, in which estimates of the speciation and extinction rates are used in calculating the posterior probabilities, and a hierarchical Bayesian analysis, in which these parameters are removed from the model by an additional integration. The Markov Chain Monte Carlo method avoids the requirement of our earlier method for calculating MAP trees to sum over all possible topologies (which limited the number of taxa in an analysis to about five). The methods are applied to analyze DNA sequences for nine species of primates, and the MAP tree, which is identical to a maximum-likelihood estimate of topology, has a probability of approximately 95%.   相似文献   

17.
We developed a software tool (SlidingBayes) for recombination analysis based on Bayesian phylogenetic inference. Sliding-Bayes provides a powerful approach for detecting potential recombination, especially between highly divergent sequences and complex HIV-1 recombinants for which simpler methods like neighbor joining (NJ) may be less powerful. SlidingBayes guides Markov Chain Monte Carlo (MCMC) sampling performed by MrBayes in a sliding window across the alignment (Bayesian scanning). The tool can be used for nucleotide and amino acid sequences and combines all the modeling possibilities of MrBayes with the ability to plot the posterior probability support for clustering of various combinations of taxa.  相似文献   

18.
Bayesian inference for small-sample capture-recapture data   总被引:1,自引:0,他引:1  
We consider data on the survival of a population of Cephalorhynchus hectori, Hector's dolphins, in a marine area of New Zealand. To estimate survival probabilities of animal populations, a multiple capture-recapture sampling scheme can be used. In this paper, we propose a practical methodology to derive approximations to posterior distributions based on Laplace methods. We show how to calculate Bayes estimates and credible intervals in this setting.  相似文献   

19.
We defend and expand on our earlier proposal for an inclusive philosophical framework for phylogenetics, based on an interpretation of Popperian corroboration that is decoupled from the popular falsificationist interpretation of Popperian philosophy. Any phylogenetic inference method can provide Popperian "evidence" or "test statements" based on the method's goodness-of-fit values for different tree hypotheses. Corroboration, or the severity of that test, requires that the evidence is improbable without the hypothesis, given only background knowledge that includes elements of chance. This framework contrasts with attempted Popperian justifications for cladistic parsimony--in which evidence is the data, background knowledge is restricted to descent with modification, and "corroboration," as a by-product of nonfalsification, is to be measured by cladistic parsimony. Recognition that cladistic "corroboration" reflects only goodness-of-fit, not corroboration/severity, makes it clear that standard cladistic prohibitions, such as restrictions on the evolutionary models to be included in "background knowledge," have no philosophical status. The capacity to assess Popperian corroboration neither justifies nor excludes any phylogenetic method, but it does provide a framework in phylogenetics for learning from errors--cases where apparent good evidence is probable even without the hypothesis. We explore these issues in the context of corroboration assessments applied to likelihood methods and to a new form of parsimony. These different forms of evidence and corroboration assessment point also to a new way to combine evidence--not at the level of overall fit, but at the level of overall corroboration/severity. We conclude that progress in an inclusive phylogenetics will be well served by the rejection of cladistic philosophy.  相似文献   

20.
Sample size for a phylogenetic inference.   总被引:1,自引:0,他引:1  
The objective of this work is to describe sample-size calculations for the inference of a nonzero central branch length in an unrooted four-species phylogeny. Attention is restricted to independent binary characters, such as might be obtained from an alignment of the purine-pyrimidine sequences of a nucleic acid molecule. A statistical test based on a multinomial model for character-state configurations is described. The importance of including invariable sites in models for sequence change is demonstrated, and their effect on sample size is quantified. The methods are applied to a four-species alignment of small-subunit rRNA sequences derived from two archaebacteria, a eubacteria and a eukaryote. We conclude that the information in these sequences is not sufficient to resolve the branching order of this tree. Estimates of the number of aligned nucleotide positions required to provide a reasonably powerful test are given.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号