首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 32 毫秒
1.
Polytomies and Bayesian phylogenetic inference   总被引:16,自引:0,他引:16  
Bayesian phylogenetic analyses are now very popular in systematics and molecular evolution because they allow the use of much more realistic models than currently possible with maximum likelihood methods. There are, however, a growing number of examples in which large Bayesian posterior clade probabilities are associated with very short branch lengths and low values for non-Bayesian measures of support such as nonparametric bootstrapping. For the four-taxon case when the true tree is the star phylogeny, Bayesian analyses become increasingly unpredictable in their preference for one of the three possible resolved tree topologies as data set size increases. This leads to the prediction that hard (or near-hard) polytomies in nature will cause unpredictable behavior in Bayesian analyses, with arbitrary resolutions of the polytomy receiving very high posterior probabilities in some cases. We present a simple solution to this problem involving a reversible-jump Markov chain Monte Carlo (MCMC) algorithm that allows exploration of all of tree space, including unresolved tree topologies with one or more polytomies. The reversible-jump MCMC approach allows prior distributions to place some weight on less-resolved tree topologies, which eliminates misleadingly high posteriors associated with arbitrary resolutions of hard polytomies. Fortunately, assigning some prior probability to polytomous tree topologies does not appear to come with a significant cost in terms of the ability to assess the level of support for edges that do exist in the true tree. Methods are discussed for applying arbitrary prior distributions to tree topologies of varying resolution, and an empirical example showing evidence of polytomies is analyzed and discussed.  相似文献   

2.
The Bayesian method of phylogenetic inference often produces high posterior probabilities (PPs) for trees or clades, even when the trees are clearly incorrect. The problem appears to be mainly due to large sizes of molecular datasets and to the large-sample properties of Bayesian model selection and its sensitivity to the prior when several of the models under comparison are nearly equally correct (or nearly equally wrong) and are of the same dimension. A previous suggestion to alleviate the problem is to let the internal branch lengths in the tree become increasingly small in the prior with the increase in the data size so that the bifurcating trees are increasingly star-like. In particular, if the internal branch lengths are assigned the exponential prior, the prior mean mu0 should approach zero faster than 1/square root n but more slowly than 1/n, where n is the sequence length. This paper examines the usefulness of this data size-dependent prior using a dataset of the mitochondrial protein-coding genes from the baleen whales, with the prior mean fixed at mu0=0.1n(-2/3). In this dataset, phylogeny reconstruction is sensitive to the assumed evolutionary model, species sampling and the type of data (DNA or protein sequences), but Bayesian inference using the default prior attaches high PPs for conflicting phylogenetic relationships. The data size-dependent prior alleviates the problem to some extent, giving weaker support for unstable relationships. This prior may be useful in reducing apparent conflicts in the results of Bayesian analysis or in making the method less sensitive to model violations.  相似文献   

3.
MRBAYES: Bayesian inference of phylogenetic trees   总被引:108,自引:0,他引:108  
SUMMARY: The program MRBAYES performs Bayesian inference of phylogeny using a variant of Markov chain Monte Carlo. AVAILABILITY: MRBAYES, including the source code, documentation, sample data files, and an executable, is available at http://brahms.biology.rochester.edu/software.html.  相似文献   

4.
MOTIVATION: Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. Metropolis coupled MCMC [(MC)(3)], a variant of MCMC, allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time. RESULTS: This paper presents a parallel algorithm for (MC)(3). The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets.  相似文献   

5.
MrBayes 3: Bayesian phylogenetic inference under mixed models   总被引:150,自引:0,他引:150  
MrBayes 3 performs Bayesian phylogenetic analysis combining information from different data partitions or subsets evolving under different stochastic evolutionary models. This allows the user to analyze heterogeneous data sets consisting of different data types-e.g. morphological, nucleotide, and protein-and to explore a wide variety of structured models mixing partition-unique and shared parameters. The program employs MPI to parallelize Metropolis coupling on Macintosh or UNIX clusters.  相似文献   

6.

Background  

Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA's on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors.  相似文献   

7.
The main limiting factor in Bayesian MCMC analysis of phylogeny is typically the efficiency with which topology proposals sample tree space. Here we evaluate the performance of seven different proposal mechanisms, including most of those used in current Bayesian phylogenetics software. We sampled 12 empirical nucleotide data sets--ranging in size from 27 to 71 taxa and from 378 to 2,520 sites--under difficult conditions: short runs, no Metropolis-coupling, and an oversimplified substitution model producing difficult tree spaces (Jukes Cantor with equal site rates). Convergence was assessed by comparison to reference samples obtained from multiple Metropolis-coupled runs. We find that proposals producing topology changes as a side effect of branch length changes (LOCAL and Continuous Change) consistently perform worse than those involving stochastic branch rearrangements (nearest neighbor interchange, subtree pruning and regrafting, tree bisection and reconnection, or subtree swapping). Among the latter, moves that use an extension mechanism to mix local with more distant rearrangements show better overall performance than those involving only local or only random rearrangements. Moves with only local rearrangements tend to mix well but have long burn-in periods, whereas moves with random rearrangements often show the reverse pattern. Combinations of moves tend to perform better than single moves. The time to convergence can be shortened considerably by starting with a good tree, but this comes at the cost of compromising convergence diagnostics based on overdispersed starting points. Our results have important implications for developers of Bayesian MCMC implementations and for the large group of users of Bayesian phylogenetics software.  相似文献   

8.
Nonhomogeneous substitution models have been introduced for phylogenetic inference when the substitution process is nonstationary, for example, when sequence composition differs between lineages. Existing models can have many parameters, and it is then difficult and computationally expensive to learn the parameters and to select the optimal model complexity. We extend an existing nonhomogeneous substitution model by introducing a reversible jump Markov chain Monte Carlo method for efficient Bayesian inference of the model order along with other phylogenetic parameters of interest. We also introduce a new hierarchical prior which leads to more reasonable results when only a small number of lineages share a particular substitution process. The method is implemented in the PHASE software, which includes specialized substitution models for RNA genes with conserved secondary structure. We apply an RNA-specific nonhomogeneous model to a structure-based alignment of rRNA sequences spanning the entire tree of life. A previous study of the same genes from a similar set of species found robust evidence for a mesophilic last universal common ancestor (LUCA) by inference of the G+C composition at the root of the tree. In the present study, we find that the helical GC composition at the root is strongly dependent on the root position. With a bacterial rooting, we find that there is no longer strong support for either a mesophile or a thermophile LUCA, although a hyperthermophile LUCA remains unlikely. We discuss reasons why results using only RNA helices may differ from results using all aligned sites when applying nonhomogeneous models to RNA genes.  相似文献   

9.
S-Adenosylhomocysteine hydrolase (SahH) is involved in the degradation of the compound which inhibits methylation reactions. Using a Bayesian approach and other methods, we reconstructed a phylogenetic tree of amino acid sequences of this protein originating from all three major domains of living organisms. The SahH sequences formed two major branches: one composed mainly of Archaea and the other of eukaryotes and majority of bacteria, clearly contradicting the three-domain topology shown by small subunit rRNA gene. This topology suggests the occurrence of lateral transfer of this gene between the domains. Poor resolution of eukaryotes and bacteria excluded an ultimate conclusion in which out of the two domains this gene appeared first, however, the congruence of the secondary branches with SS rRNA and/or concatenated ribosomal protein datasets phylogenies suggested an "early" acquisition by some bacterial and eukaryotic phyla. Similarly, the branching pattern of Archaea reflected the phylogenies shown by SS rRNA and ribosomal proteins. SahH is widespread in Eucarya, albeit, due to reductive evolution, it is missing in the intracellular parasite Encephalitozoon cuniculi. On the other hand, the lack of affinity to the sequences from the alpha-Proteobacteria and cyanobacteria excludes a possibility of its acquisition in the course of mitochondrial or chloroplast endosymbioses. Unlike Archaea, most bacteria carry MTA/SAH nucleosidase, an enzyme involved also in metabolism of methylthioadenosine. However, the double function of MTA/SAH nucleosidase may be a barrier to ensure the efficient degradation of S-adenosylhomocysteine, specially when the intensity of methylation processes is high. This would explain the presence of S-adenosylhomocysteine hydrolase in the bacteria that have more complex metabolism. On the other hand, majority of obligate pathogenic bacteria due to simpler metabolism rely entirely on MTA/SAH nucleosidase. This could explain the observed phenetic pattern in which bacteria with larger (>6 Mb-million base pairs) genomes carry SAH hydrolase, whereas bacteria that have undergone reductive evolution usually carry MTA/SAH nucleosidase. This suggests that the presence or acquisition of S-adenosylhomocysteine hydrolase in bacteria may predispose towards higher metabolic, and in consequence, higher genomic complexity. The good examples are the phototrophic bacteria all of which carry this gene, however, the SahH phylogeny shows lack of congruence with SSU rRNA and photosyntethic genes, implying that the acquisition was independent and presumably preceded the acquisition of photosyntethic genes. The majority of cyanobacteria acquired this gene from Archaea, however, in some species the sahH gene was replaced by a copy from the beta- or gamma-Proteobacteria.  相似文献   

10.
For a model of molecular evolution to be useful for phylogenetic inference, the topology of evolutionary trees must be identifiable. That is, from a joint distribution the model predicts, it must be possible to recover the tree parameter. We establish tree identifiability for a number of phylogenetic models, including a covarion model and a variety of mixture models with a limited number of classes. The proof is based on the introduction of a more general model, allowing more states at internal nodes of the tree than at leaves, and the study of the algebraic variety formed by the joint distributions to which it gives rise. Tree identifiability is first established for this general model through the use of certain phylogenetic invariants.  相似文献   

11.
SUMMARY: MAC5 implements MCMC sampling of the posterior distribution of tree topologies from DNA sequences containing gaps by using a five state model of evolution (the four nucleotides and the gap character).  相似文献   

12.
The objective of this study was to obtain a quantitative assessment of the monophyly of morning glory taxa, specifically the genus Ipomoea and the tribe Argyreieae. Previous systematic studies of morning glories intimated the paraphyly of Ipomoea by suggesting that the genera within the tribe Argyreieae are derived from within Ipomoea; however, no quantitative estimates of statistical support were developed to address these questions. We applied a Bayesian analysis to provide quantitative estimates of monophyly in an investigation of morning glory relationships using DNA sequence data. We also explored various approaches for examining convergence of the Markov chain Monte Carlo (MCMC) simulation of the Bayesian analysis by running 18 separate analyses varying in length. We found convergence of the important components of the phylogenetic model (the tree with the maximum posterior probability, branch lengths, the parameter values from the DNA substitution model, and the posterior probabilities for clade support) for these data after one million generations of the MCMC simulations. In the process, we identified a run where the parameter values obtained were often outside the range of values obtained from the other runs, suggesting an aberrant result. In addition, we compared the Bayesian method of phylogenetic analysis to maximum likelihood and maximum parsimony. The results from the Bayesian analysis and the maximum likelihood analysis were similar for topology, branch lengths, and parameters of the DNA substitution model. Topologies also were similar in the comparison between the Bayesian analysis and maximum parsimony, although the posterior probabilities and the bootstrap proportions exhibited some striking differences. In a Bayesian analysis of three data sets (ITS sequences, waxy sequences, and ITS + waxy sequences) no supoort for the monophyly of the genus Ipomoea, or for the tribe Argyreieae, was observed, with the estimate of the probability of the monophyly of these taxa being less than 3.4 x 10(-7).  相似文献   

13.
An improved Bayesian method is presented for estimating phylogenetic treesusing DNA sequence data. The birth-death process with species sampling isused to specify the prior distribution of phylogenies and ancestralspeciation times, and the posterior probabilities of phylogenies are usedto estimate the maximum posterior probability (MAP) tree. Monte Carlointegration is used to integrate over the ancestral speciation times forparticular trees. A Markov Chain Monte Carlo method is used to generate theset of trees with the highest posterior probabilities. Methods aredescribed for an empirical Bayesian analysis, in which estimates of thespeciation and extinction rates are used in calculating the posteriorprobabilities, and a hierarchical Bayesian analysis, in which theseparameters are removed from the model by an additional integration. TheMarkov Chain Monte Carlo method avoids the requirement of our earliermethod for calculating MAP trees to sum over all possible topologies (whichlimited the number of taxa in an analysis to about five). The methods areapplied to analyze DNA sequences for nine species of primates, and the MAPtree, which is identical to a maximum-likelihood estimate of topology, hasa probability of approximately 95%.  相似文献   

14.
We defend and expand on our earlier proposal for an inclusive philosophical framework for phylogenetics, based on an interpretation of Popperian corroboration that is decoupled from the popular falsificationist interpretation of Popperian philosophy. Any phylogenetic inference method can provide Popperian "evidence" or "test statements" based on the method's goodness-of-fit values for different tree hypotheses. Corroboration, or the severity of that test, requires that the evidence is improbable without the hypothesis, given only background knowledge that includes elements of chance. This framework contrasts with attempted Popperian justifications for cladistic parsimony--in which evidence is the data, background knowledge is restricted to descent with modification, and "corroboration," as a by-product of nonfalsification, is to be measured by cladistic parsimony. Recognition that cladistic "corroboration" reflects only goodness-of-fit, not corroboration/severity, makes it clear that standard cladistic prohibitions, such as restrictions on the evolutionary models to be included in "background knowledge," have no philosophical status. The capacity to assess Popperian corroboration neither justifies nor excludes any phylogenetic method, but it does provide a framework in phylogenetics for learning from errors--cases where apparent good evidence is probable even without the hypothesis. We explore these issues in the context of corroboration assessments applied to likelihood methods and to a new form of parsimony. These different forms of evidence and corroboration assessment point also to a new way to combine evidence--not at the level of overall fit, but at the level of overall corroboration/severity. We conclude that progress in an inclusive phylogenetics will be well served by the rejection of cladistic philosophy.  相似文献   

15.
Sample size for a phylogenetic inference.   总被引:1,自引:0,他引:1  
The objective of this work is to describe sample-size calculations for the inference of a nonzero central branch length in an unrooted four-species phylogeny. Attention is restricted to independent binary characters, such as might be obtained from an alignment of the purine-pyrimidine sequences of a nucleic acid molecule. A statistical test based on a multinomial model for character-state configurations is described. The importance of including invariable sites in models for sequence change is demonstrated, and their effect on sample size is quantified. The methods are applied to a four-species alignment of small-subunit rRNA sequences derived from two archaebacteria, a eubacteria and a eukaryote. We conclude that the information in these sequences is not sufficient to resolve the branching order of this tree. Estimates of the number of aligned nucleotide positions required to provide a reasonably powerful test are given.  相似文献   

16.
Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.  相似文献   

17.
Bayesian multimodel inference for geostatistical regression models   总被引:2,自引:0,他引:2  
Johnson DS  Hoeting JA 《PloS one》2011,6(11):e25677
The problem of simultaneous covariate selection and parameter inference for spatial regression models is considered. Previous research has shown that failure to take spatial correlation into account can influence the outcome of standard model selection methods. A Markov chain Monte Carlo (MCMC) method is investigated for the calculation of parameter estimates and posterior model probabilities for spatial regression models. The method can accommodate normal and non-normal response data and a large number of covariates. Thus the method is very flexible and can be used to fit spatial linear models, spatial linear mixed models, and spatial generalized linear mixed models (GLMMs). The Bayesian MCMC method also allows a priori unequal weighting of covariates, which is not possible with many model selection methods such as Akaike's information criterion (AIC). The proposed method is demonstrated on two data sets. The first is the whiptail lizard data set which has been previously analyzed by other researchers investigating model selection methods. Our results confirmed the previous analysis suggesting that sandy soil and ant abundance were strongly associated with lizard abundance. The second data set concerned pollution tolerant fish abundance in relation to several environmental factors. Results indicate that abundance is positively related to Strahler stream order and a habitat quality index. Abundance is negatively related to percent watershed disturbance.  相似文献   

18.
19.
Bayesian inference for a bivariate binomial distribution   总被引:1,自引:0,他引:1  
  相似文献   

20.
Systems neuroscience traditionally conceptualizes a population of spiking neurons as merely encoding the value of a stimulus. Yet, psychophysics has revealed that people take into account stimulus uncertainty when performing sensory or motor computations and do so in a nearly Bayes-optimal way. This suggests that neural populations do not encode just a single value but an entire probability distribution over the stimulus. Several such probabilistic codes have been proposed, including one that utilizes the structure of neural variability to enable simple neural implementations of probabilistic computations such as optimal cue integration. This approach provides a quantitative link between Bayes-optimal behaviors and specific neural operations. It allows for novel ways to evaluate probabilistic codes and for predictions for physiological population recordings.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号