首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We explore model-based techniques of phylogenetic tree inference exercising Markov invariants. Markov invariants are group invariant polynomials and are distinct from what is known in the literature as phylogenetic invariants, although we establish a commonality in some special cases. We show that the simplest Markov invariant forms the foundation of the Log–Det distance measure. We take as our primary tool group representation theory, and show that it provides a general framework for analyzing Markov processes on trees. From this algebraic perspective, the inherent symmetries of these processes become apparent, and focusing on plethysms, we are able to define Markov invariants and give existence proofs. We give an explicit technique for constructing the invariants, valid for any number of character states and taxa. For phylogenetic trees with three and four leaves, we demonstrate that the corresponding Markov invariants can be fruitfully exploited in applied phylogenetic studies.  相似文献   

2.
Bayesian inference provides an appealing general framework for phylogenetic analysis, able to incorporate a wide variety of modeling assumptions and to provide a coherent treatment of uncertainty. Existing computational approaches to bayesian inference based on Markov chain Monte Carlo (MCMC) have not, however, kept pace with the scale of the data analysis problems in phylogenetics, and this has hindered the adoption of bayesian methods. In this paper, we present an alternative to MCMC based on Sequential Monte Carlo (SMC). We develop an extension of classical SMC based on partially ordered sets and show how to apply this framework--which we refer to as PosetSMC--to phylogenetic analysis. We provide a theoretical treatment of PosetSMC and also present experimental evaluation of PosetSMC on both synthetic and real data. The empirical results demonstrate that PosetSMC is a very promising alternative to MCMC, providing up to two orders of magnitude faster convergence. We discuss other factors favorable to the adoption of PosetSMC in phylogenetics, including its ability to estimate marginal likelihoods, its ready implementability on parallel and distributed computing platforms, and the possibility of combining with MCMC in hybrid MCMC-SMC schemes. Software for PosetSMC is available at http://www.stat.ubc.ca/ bouchard/PosetSMC.  相似文献   

3.
A key element to a successful Markov chain Monte Carlo (MCMC) inference is the programming and run performance of the Markov chain. However, the explicit use of quality assessments of the MCMC simulations-convergence diagnostics-in phylogenetics is still uncommon. Here, we present a simple tool that uses the output from MCMC simulations and visualizes a number of properties of primary interest in a Bayesian phylogenetic analysis, such as convergence rates of posterior split probabilities and branch lengths. Graphical exploration of the output from phylogenetic MCMC simulations gives intuitive and often crucial information on the success and reliability of the analysis. The tool presented here complements convergence diagnostics already available in other software packages primarily designed for other applications of MCMC. Importantly, the common practice of using trace-plots of a single parameter or summary statistic, such as the likelihood score of sampled trees, can be misleading for assessing the success of a phylogenetic MCMC simulation. AVAILABILITY: The program is available as source under the GNU General Public License and as a web application at http://ceb.scs.fsu.edu/awty.  相似文献   

4.
Phylogenetic reconstruction from DNA or amino acid sequences relies heavily on suitable distance measures. A number of new distance measures (asynchronous, LogDet, and paralinear distances) which possess the desired property of tree additivity under fairly general models of sequence evolution have been proposed recently, but they are not well understood from a mechanistic point of view. We review them here in a unifying framework, which is the substitution process in continuous time. The emerging interpretation will also clarify the relationship among these distance measures. We also tackle situations with site-to-site variation of substitution rates which is well known to cause non-additive distances and inconsistent branch lengths. For homogeneous, stationary, time-reversible models, this may be repaired provided that the distribution of rates is known. In contrast, we will show that, for non-stationary models, different tree topologies may produce identical joint distributions of letters in pairs of sequences, given the same distribution of rates. This precludes the existence of any tree-additive pairwise distance measure.  相似文献   

5.
The multispecies coalescent (MSC) is a statistical framework that models how gene genealogies grow within the branches of a species tree. The field of computational phylogenetics has witnessed an explosion in the development of methods for species tree inference under MSC, owing mainly to the accumulating evidence of incomplete lineage sorting in phylogenomic analyses. However, the evolutionary history of a set of genomes, or species, could be reticulate due to the occurrence of evolutionary processes such as hybridization or horizontal gene transfer. We report on a novel method for Bayesian inference of genome and species phylogenies under the multispecies network coalescent (MSNC). This framework models gene evolution within the branches of a phylogenetic network, thus incorporating reticulate evolutionary processes, such as hybridization, in addition to incomplete lineage sorting. As phylogenetic networks with different numbers of reticulation events correspond to points of different dimensions in the space of models, we devise a reversible-jump Markov chain Monte Carlo (RJMCMC) technique for sampling the posterior distribution of phylogenetic networks under MSNC. We implemented the methods in the publicly available, open-source software package PhyloNet and studied their performance on simulated and biological data. The work extends the reach of Bayesian inference to phylogenetic networks and enables new evolutionary analyses that account for reticulation.  相似文献   

6.
7.
Recent advances in the field of plant community phylogenetics and invasion phylogenetics are mostly based on plot-level data, which do not take into consideration the spatial arrangement of individual plants within the plot. Here we use within-plot plant coordinates to investigate the link between the physical distance separating plants, and their phylogenetic relatedness. We look at two vegetation types (forest and grassland, similar in species richness and in the proportion of alien invasive plants) in subtropical coastal KwaZulu-Natal, South Africa. The relationship between phylogenetic distance and physical distance is weak in grassland (characterised by higher plant densities and low phylogenetic diversity), and varies substantially in forest vegetation (variable plant density, higher phylogenetic diversity). There is no significant relationship between the proportion of alien plants in the plots and the strength of the physical-phylogenetic distance relationship, suggesting that alien plants are well integrated in the local spatial-phylogenetic landscape.  相似文献   

8.
We present a model of amino acid sequence evolution based on a hidden Markov model that extends to transmembrane proteins previous methods that incorporate protein structural information into phylogenetics. Our model aims to give a better understanding of processes of molecular evolution and to extract structural information from multiple alignments of transmembrane sequences and use such information to improve phylogenetic analyses. This should be of value in phylogenetic studies of transmembrane proteins: for example, mitochondrial proteins have acquired a special importance in phylogenetics and are mostly transmembrane proteins. The improvement in fit to example data sets of our new model relative to less complex models of amino acid sequence evolution is statistically tested. To further illustrate the potential utility of our method, phylogeny estimation is performed on primate CCR5 receptor sequences, sequences of l and m subunits of the light reaction center in purple bacteria, guinea pig sequences with respect to lagomorph and rodent sequences of calcitonin receptor and K-substance receptor, and cetacean sequences of cytochrome b.  相似文献   

9.
The models of nucleotide substitution used by most maximum likelihood-based methods assume that the evolutionary process is stationary, reversible, and homogeneous. We present an extension of the Barry and Hartigan model, which can be used to estimate parameters by maximum likelihood (ML) when the data contain invariant sites and there are violations of the assumptions of stationarity, reversibility, and homogeneity. Unlike most ML methods for estimating invariant sites, we estimate the nucleotide composition of invariant sites separately from that of variable sites. We analyze a bacterial data set where problems due to lack of stationarity and homogeneity have been previously well noted and use the parametric bootstrap to show that the data are consistent with our general Markov model. We also show that estimates of invariant sites obtained using our method are fairly accurate when applied to data simulated under the general Markov model.  相似文献   

10.
Phylogenetic networks have now joined phylogenetic trees in the center of phylogenetics research. Like phylogenetic trees, such networks canonically induce collections of phylogenetic trees, clusters, and triplets, respectively. Thus it is not surprising that many network approaches aim to reconstruct a phylogenetic network from such collections. Related to the well-studied perfect phylogeny problem, the following question is of fundamental importance in this context: When does one of the above collections encode (i.e. uniquely describe) the network that induces it? For the large class of level-1 (phylogenetic) networks we characterize those level-1 networks for which an encoding in terms of one (or equivalently all) of the above collections exists. In addition, we show that three known distance measures for comparing phylogenetic networks are in fact metrics on the resulting subclass and give the diameter for two of them. Finally, we investigate the related concept of indistinguishability and also show that many properties enjoyed by level-1 networks are not satisfied by networks of higher level.  相似文献   

11.
Bayesian Markov chain Monte Carlo sampling has become increasingly popular in phylogenetics as a method for both estimating the maximum likelihood topology and for assessing nodal confidence. Despite the growing use of posterior probabilities, the relationship between the Bayesian measure of confidence and the most commonly used confidence measure in phylogenetics, the nonparametric bootstrap proportion, is poorly understood. We used computer simulation to investigate the behavior of three phylogenetic confidence methods: Bayesian posterior probabilities calculated via Markov chain Monte Carlo sampling (BMCMC-PP), maximum likelihood bootstrap proportion (ML-BP), and maximum parsimony bootstrap proportion (MP-BP). We simulated the evolution of DNA sequence on 17-taxon topologies under 18 evolutionary scenarios and examined the performance of these methods in assigning confidence to correct monophyletic and incorrect monophyletic groups, and we examined the effects of increasing character number on support value. BMCMC-PP and ML-BP were often strongly correlated with one another but could provide substantially different estimates of support on short internodes. In contrast, BMCMC-PP correlated poorly with MP-BP across most of the simulation conditions that we examined. For a given threshold value, more correct monophyletic groups were supported by BMCMC-PP than by either ML-BP or MP-BP. When threshold values were chosen that fixed the rate of accepting incorrect monophyletic relationship as true at 5%, all three methods recovered most of the correct relationships on the simulated topologies, although BMCMC-PP and ML-BP performed better than MP-BP. BMCMC-PP was usually a less biased predictor of phylogenetic accuracy than either bootstrapping method. BMCMC-PP provided high support values for correct topological bipartitions with fewer characters than was needed for nonparametric bootstrap.  相似文献   

12.
Identifying patterns and drivers of infectious disease dynamics across multiple scales is a fundamental challenge for modern science. There is growing awareness that it is necessary to incorporate multi‐host and/or multi‐parasite interactions to understand and predict current and future disease threats better, and new tools are needed to help address this task. Eco‐phylogenetics (phylogenetic community ecology) provides one avenue for exploring multi‐host multi‐parasite systems, yet the incorporation of eco‐phylogenetic concepts and methods into studies of host pathogen dynamics has lagged behind. Eco‐phylogenetics is a transformative approach that uses evolutionary history to infer present‐day dynamics. Here, we present an eco‐phylogenetic framework to reveal insights into parasite communities and infectious disease dynamics across spatial and temporal scales. We illustrate how eco‐phylogenetic methods can help untangle the mechanisms of host–parasite dynamics from individual (e.g. co‐infection) to landscape scales (e.g. parasite/host community structure). An improved ecological understanding of multi‐host and multi‐pathogen dynamics across scales will increase our ability to predict disease threats.  相似文献   

13.
The amino acid sequences of proteins provide rich information for inferring distant phylogenetic relationships and for predicting protein functions. Estimating the rate matrix of residue substitutions from amino acid sequences is also important because the rate matrix can be used to develop scoring matrices for sequence alignment. Here we use a continuous time Markov process to model the substitution rates of residues and develop a Bayesian Markov chain Monte Carlo method for rate estimation. We validate our method using simulated artificial protein sequences. Because different local regions such as binding surfaces and the protein interior core experience different selection pressures due to functional or stability constraints, we use our method to estimate the substitution rates of local regions. Our results show that the substitution rates are very different for residues in the buried core and residues on the solvent-exposed surfaces. In addition, the rest of the proteins on the binding surfaces also have very different substitution rates from residues. Based on these findings, we further develop a method for protein function prediction by surface matching using scoring matrices derived from estimated substitution rates for residues located on the binding surfaces. We show with examples that our method is effective in identifying functionally related proteins that have overall low sequence identity, a task known to be very challenging.  相似文献   

14.
15.
Ionizing radiation damage to the genome of a non-cycling mammalian cell is analyzed using continuous time Markov chains. Immediate damage induced by the radiation is modeled as a batch Poisson arrival process of DNA double strand breaks (DSBs). Different kinds of radiation, for example gamma rays or alpha particles, have different batch probabilities. Enzymatic modulation of the immediate damage is modeled as a Markov process similar to the processes described by the master equation of stochastic chemical kinetics. An illustrative example is the restitution/complete exchange model, which postulates that radiation induced DSBs can subsequently either undergo enzymatically mediated repair (restitution) or can participate pairwise in chromosome exchanges, some of which make irremediable lesions such as dicentric chromosome aberrations. One may have rapid irradiation followed by enzymatic DSB processing or have prolonged irradiation with both DSB arrival and enzymatic DSB processing continuing throughout the irradiation period. A complete solution of the Markov chain is known for the case that the exchange rate constant is negligible so that no irremediable chromosome lesions are produced and DSBs are the only damage to the genome. Using PDEs for generating functions, a perturbation calculation is made assuming the exchange rate constant is small compared to the repair rate constant. Some non-perturbative results applicable to very prolonged irradiation are also obtained using matrix methods: Perron-Frobenius theory, variational methods and numerical approximations of eigenvalues. Applications to experimental results on expected values, variances and statistical distributions of DNA lesions are briefly outlined.Continuous time Markov chain models are the most systematic of those current radiation damage models which treat DSB-DSB interactions within the cell nucleus as homogeneous (e.g. ignore diffusion limitations). They contain most other homogeneous models as special cases, limiting cases or approximations. However, applying the continuous time Markov chain models to studying spatial dependence of DSB interactions, which is generally believed to be very important in some situations, presents difficulties.  相似文献   

16.
We propose a Bayesian method for testing molecular clock hypotheses for use with aligned sequence data from multiple taxa. Our method utilizes a nonreversible nucleotide substitution model to avoid the necessity of specifying either a known tree relating the taxa or an outgroup for rooting the tree. We employ reversible jump Markov chain Monte Carlo to sample from the posterior distribution of the phylogenetic model parameters and conduct hypothesis testing using Bayes factors, the ratio of the posterior to prior odds of competing models. Here, the Bayes factors reflect the relative support of the sequence data for equal rates of evolutionary change between taxa versus unequal rates, averaged over all possible phylogenetic parameters, including the tree and root position. As the molecular clock model is a restriction of the more general unequal rates model, we use the Savage-Dickey ratio to estimate the Bayes factors. The Savage-Dickey ratio provides a convenient approach to calculating Bayes factors in favor of sharp hypotheses. Critical to calculating the Savage-Dickey ratio is a determination of the prior induced on the modeling restrictions. We demonstrate our method on a well-studied mtDNA sequence data set consisting of nine primates. We find strong support against a global molecular clock, but do find support for a local clock among the anthropoids. We provide mathematical derivations of the induced priors on branch length restrictions assuming equally likely trees. These derivations also have more general applicability to the examination of prior assumptions in Bayesian phylogenetics.  相似文献   

17.
Bayesian phylogenetics with BEAUti and the BEAST 1.7   总被引:7,自引:0,他引:7  
Computational evolutionary biology, statistical phylogenetics and coalescent-based population genetics are becoming increasingly central to the analysis and understanding of molecular sequence data. We present the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7, which implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. This package includes an enhanced graphical user interface program called Bayesian Evolutionary Analysis Utility (BEAUti) that enables access to advanced models for molecular sequence and phenotypic trait evolution that were previously available to developers only. The package also provides new tools for visualizing and summarizing multispecies coalescent and phylogeographic analyses. BEAUti and BEAST 1.7 are open source under the GNU lesser general public license and available at http://beast-mcmc.googlecode.com and http://beast.bio.ed.ac.uk.  相似文献   

18.
Brownian motions on coalescent structures have a biological relevance, either as an approximation of the stepwise mutation model for microsatellites, or as a model of spatial evolution considering the locations of individuals at successive generations. We discuss estimation procedures for the dispersal parameter of a Brownian motion defined on coalescent trees. First, we consider the mean square distance unbiased estimator and compute its variance. In a second approach, we introduce a phylogenetic estimator. Given the UPGMA topology, the likelihood of the parameter is computed thanks to a new dynamical programming method. By a proper correction, an unbiased estimator is derived from the pseudomaximum of the likelihood. The last approach consists of computing the likelihood by a Markov chain Monte Carlo sampling method. In the one-dimensional Brownian motion, this method seems less reliable than pseudomaximum-likelihood.  相似文献   

19.
We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号