首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Wu CH  Drummond AJ 《Genetics》2011,188(1):151-164
We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.  相似文献   

2.
BEAST 2: A Software Platform for Bayesian Evolutionary Analysis   总被引:1,自引:0,他引:1  
We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format.
This is a PLOS Computational Biology Software Article.
  相似文献   

3.
Single-cell sequencing provides a new way to explore the evolutionary history of cells. Compared to traditional bulk sequencing, where a population of heterogeneous cells is pooled to form a single observation, single-cell sequencing isolates and amplifies genetic material from individual cells, thereby preserving the information about the origin of the sequences. However, single-cell data are more error-prone than bulk sequencing data due to the limited genomic material available per cell. Here, we present error and mutation models for evolutionary inference of single-cell data within a mature and extensible Bayesian framework, BEAST2. Our framework enables integration with biologically informative models such as relaxed molecular clocks and population dynamic models. Our simulations show that modeling errors increase the accuracy of relative divergence times and substitution parameters. We reconstruct the phylogenetic history of a colorectal cancer patient and a healthy patient from single-cell DNA sequencing data. We find that the estimated times of terminal splitting events are shifted forward in time compared to models which ignore errors. We observed that not accounting for errors can overestimate the phylogenetic diversity in single-cell DNA sequencing data. We estimate that 30–50% of the apparent diversity can be attributed to error. Our work enables a full Bayesian approach capable of accounting for errors in the data within the integrative Bayesian software framework BEAST2.  相似文献   

4.
Bayesian phylogenetics with BEAUti and the BEAST 1.7   总被引:7,自引:0,他引:7  
Computational evolutionary biology, statistical phylogenetics and coalescent-based population genetics are becoming increasingly central to the analysis and understanding of molecular sequence data. We present the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7, which implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. This package includes an enhanced graphical user interface program called Bayesian Evolutionary Analysis Utility (BEAUti) that enables access to advanced models for molecular sequence and phenotypic trait evolution that were previously available to developers only. The package also provides new tools for visualizing and summarizing multispecies coalescent and phylogeographic analyses. BEAUti and BEAST 1.7 are open source under the GNU lesser general public license and available at http://beast-mcmc.googlecode.com and http://beast.bio.ed.ac.uk.  相似文献   

5.
Recent developments in marginal likelihood estimation for model selection in the field of Bayesian phylogenetics and molecular evolution have emphasized the poor performance of the harmonic mean estimator (HME). Although these studies have shown the merits of new approaches applied to standard normally distributed examples and small real-world data sets, not much is currently known concerning the performance and computational issues of these methods when fitting complex evolutionary and population genetic models to empirical real-world data sets. Further, these approaches have not yet seen widespread application in the field due to the lack of implementations of these computationally demanding techniques in commonly used phylogenetic packages. We here investigate the performance of some of these new marginal likelihood estimators, specifically, path sampling (PS) and stepping-stone (SS) sampling for comparing models of demographic change and relaxed molecular clocks, using synthetic data and real-world examples for which unexpected inferences were made using the HME. Given the drastically increased computational demands of PS and SS sampling, we also investigate a posterior simulation-based analogue of Akaike's information criterion (AIC) through Markov chain Monte Carlo (MCMC), a model comparison approach that shares with the HME the appealing feature of having a low computational overhead over the original MCMC analysis. We confirm that the HME systematically overestimates the marginal likelihood and fails to yield reliable model classification and show that the AICM performs better and may be a useful initial evaluation of model choice but that it is also, to a lesser degree, unreliable. We show that PS and SS sampling substantially outperform these estimators and adjust the conclusions made concerning previous analyses for the three real-world data sets that we reanalyzed. The methods used in this article are now available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.  相似文献   

6.
In this study, we explore the long‐standing issue of how many loci are needed to infer accurate phylogenetic relationships, and whether loci with particular attributes (e.g., parsimony informativeness, variability, gene tree resolution) outperform others. To do so, we use an empirical data set consisting of the seven species of chickadees (Aves: Paridae), an analytically tractable, recently diverged group, and well‐studied ecologically but lacking a nuclear phylogeny. We estimate relationships using 40 nuclear loci and mitochondrial DNA using four coalescent‐based species tree inference methods (BEST, *BEAST, STEM, STELLS). Collectively, our analyses contrast with previous studies and support a sister relationship between the Black‐capped and Carolina Chickadee, two superficially similar species that hybridize along a long zone of contact. Gene flow is a potential source of conflict between nuclear and mitochondrial gene trees, yet we find a significant, albeit low, signal of gene flow. Our results suggest that relatively few loci with high information content may be sufficient for estimating an accurate species tree, but that substantially more loci are necessary for accurate parameter estimation. We provide an empirical reference point for researchers designing sampling protocols with the purpose of inferring phylogenies and population parameters of closely related taxa.  相似文献   

7.
Multigene sequence data have great potential for elucidating important and interesting evolutionary processes, but statistical methods for extracting information from such data remain limited. Although various biological processes may cause different genes to have different genealogical histories (and hence different tree topologies), we also may expect that the number of distinct topologies among a set of genes is relatively small compared with the number of possible topologies. Therefore evidence about the tree topology for one gene should influence our inferences of the tree topology on a different gene, but to what extent? In this paper, we present a new approach for modeling and estimating concordance among a set of gene trees given aligned molecular sequence data. Our approach introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees. We describe a novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods. These posterior distributions are then used as input for a second MCMC procedure that estimates a posterior distribution of gene-to-tree maps (GTMs). The posterior distribution of GTMs can then be summarized to provide revised posterior probability distributions for each gene (taking account of concordance) and to allow estimation of the proportion of the sampled genes for which any given clade is true (the sample-wide concordance factor). Further, under the assumption that the sampled genes are drawn randomly from a genome of known size, we show how one can obtain an estimate, with credibility intervals, on the proportion of the entire genome for which a clade is true (the genome-wide concordance factor). We demonstrate the method on a set of 106 genes from 8 yeast species.  相似文献   

8.
Tragopogon comprises approximately 150 described species distributed throughout Eurasia from Ireland and the UK to India and China with a few species in North Africa. Most of the species diversity is found in Eastern Europe to Western Asia. Previous phylogenetic analyses identified several major clades, generally corresponding to recognized taxonomic sections, although relationships both among these clades and among species within clades remain largely unresolved. These patterns are consistent with rapid diversification following the origin of Tragopogon, and this study addresses the timing and rate of diversification in Tragopogon. Using BEAST to simultaneously estimate a phylogeny and divergence times, we estimate the age of a major split and subsequent rapid divergence within Tragopogon to be ~2.6 Ma (and 1.7–5.4 Ma using various clock estimates). Based on the age estimates obtained with BEAST (HPD 1.7–5.4 Ma) for the origin of crown group Tragopogon and 200 estimated species (to accommodate a large number of cryptic species), the diversification rate of Tragopogon is approximately 0.84–2.71 species/Myr for the crown group, assuming low levels of extinction. This estimate is comparable in rate to a rapid Eurasian radiation in Dianthus (0.66–3.89 species/Myr), which occurs in the same or similar habitats. Using available data, we show that subclades of various plant taxa that occur in the same semi‐arid habitats of Eurasia also represent rapid radiations occurring during roughly the same window of time (1.7–5.4 Ma), suggesting similar causal events. However, not all species‐rich plant genera from the same habitats diverged at the same time, or at the same tempo. Radiations of several other clades in this same habitat (e.g. Campanula, Knautia, Scabiosa) occurred at earlier dates (45–4.28 Ma). Existing phylogenetic data and diversification estimates therefore indicate that, although some elements of these semi‐arid communities radiated during the Plio‐Pleistocene period, other clades sharing the same habitat appear to have diversified earlier.  相似文献   

9.
The use of predictive models in Neotropical basins is relatively new, and applying these models in large basins is hindered by the lack of ecological, geographical, and social-environmental knowledge. Despite these difficulties, we used data from the das Velhas River basin to apply the BEAST (Benthic Assessment of SedimenT) methodology to evaluate and classify the level of environmental degradation. Our two main objectives were to modify and implement the BEAST methodology for use in biomonitoring programs of Brazilian basins, and to test the hypothesis that a gradient of environmental degradation determines a gradient in the structure and composition of benthic macroinvertebrate assemblages. We evaluated 37 sites: 8 in the main river, 15 in the main tributaries with different impact levels, and 14 in tributaries with minimally disturbed conditions (MDC). The BEAST model allowed us to classify 16 test sites: two as natural, four as altered, three as highly altered, and seven as degraded. Our results indicated degradation of the das Velhas River basin near its urban areas. The BEAST model indicated that the pollution gradient found among the sites generated a gradient of the macroinvertebrate assemblages, corroborating the hypothesis. Handling editor: S. M. Thomaz  相似文献   

10.

Background  

Past studies in the legume family (Fabaceae) have uncovered several evolutionary trends including differential mutation and diversification rates across varying taxonomic levels. The legume tribe Psoraleeae is shown herein to exemplify these trends at the generic and species levels. This group includes a sizable diversification within North America dated at approximately 6.3 million years ago with skewed species distribution to the most recently derived genus, Pediomelum, suggesting a diversification rate shift. We estimate divergence dates of North American (NAm) Psoraleeae using Bayesian MCMC sampling in BEAST based on eight DNA regions (ITS, waxy, matK, trnD-trnT, trnL-trnF, trnK, trnS-trnG, and rpoB-trnC). We also test the hypothesis of a diversification rate shift within NAm Psoraleeae using topological and temporal methods. We investigate the impact of climate change on diversification in this group by (1) testing the hypothesis that a shift from mesic to xeric habitats acted as a key innovation and (2) investigating diversification rate shifts along geologic time, discussing the impact of Quaternary climate oscillations on diversification.  相似文献   

11.
The phylogenetic relationships among the wall lizards of the Podarcis hispanicus complex that inhabit the south-east (SE) of the Iberian Peninsula and other lineages of the complex remain unclear. In this study, four mitochondrial and two nuclear markers were used to study genetic relationships within this complex. The phylogenetic analyses based on mtDNA gene trees constructed with ML and BI, and a species tree using *BEAST support three divergent clades in this region: the Valencia, Galera and Albacete/Murcia lineages. These three lineages were also corroborated in species delimitation analyses based on mtDNA using bPTP, mPTP, GMYC, ABGD and BAPS. Bayesian inference species delimitation method (BPP) based on both nuclear data and a combined data set (mtDNA + nuclear) showed high posterior probabilities for these three SE lineages (≥0.94) and another Bayesian analysis (STACEY) based on combined data set recovered the same three groups in this region. Divergence time dating of the species tree provided an estimated divergence of the Galera lineage from the other SE group (Podarcis vaucheri, (Albacete/Murcia, Valencia)) at 12.48 Ma. During this period, the Betic–Rifian arc was isolated, which could have caused the isolation of the Galera form distributed to the south of the Betic Corridor. Although lizards from the Albacete/Murcia and Galera lineage are morphologically similar, they clearly represent distinct genetic lineages. The noteworthy separation of the Galera lineage enables us to conclude that this lineage must be considered as a new full species.  相似文献   

12.
We examined the phylogenetic history of Linaria with special emphasis on the Mediterranean sect. Supinae (44 species). We revealed extensive highly supported incongruence among two nuclear (ITS, AGT1) and two plastid regions (rpl32-trnL(UAG), trnS-trnG). Coalescent simulations, a hybrid detection test and species tree inference in *BEAST revealed that incomplete lineage sorting and hybridization may both be responsible for the incongruent pattern observed. Additionally, we present a multilabelled *BEAST species tree as an alternative approach that allows the possibility of observing multiple placements in the species tree for the same taxa. That permitted the incorporation of processes such as hybridization within the tree while not violating the assumptions of the *BEAST model. This methodology is presented as a functional tool to disclose the evolutionary history of species complexes that have experienced both hybridization and incomplete lineage sorting. The drastic climatic events that have occurred in the Mediterranean since the late Miocene, including the Quaternary-type climatic oscillations, may have made both processes highly recurrent in the Mediterranean flora.  相似文献   

13.
The structured coalescent allows inferring migration patterns between viral subpopulations from genetic sequence data. However, these analyses typically assume that no genetic recombination process impacted the sequence evolution of pathogens. For segmented viruses, such as influenza, that can undergo reassortment this assumption is broken. Reassortment reshuffles the segments of different parent lineages upon a coinfection event, which means that the shared history of viruses has to be represented by a network instead of a tree. Therefore, full genome analyses of such viruses are complex or even impossible. Although this problem has been addressed for unstructured populations, it is still impossible to account for population structure, such as induced by different host populations, whereas also accounting for reassortment. We address this by extending the structured coalescent to account for reassortment and present a framework for investigating possible ties between reassortment and migration (host jump) events. This method can accurately estimate subpopulation dependent effective populations sizes, reassortment, and migration rates from simulated data. Additionally, we apply the new model to avian influenza A/H5N1 sequences, sampled from two avian host types, Anseriformes and Galliformes. We contrast our results with a structured coalescent without reassortment inference, which assumes independently evolving segments. This reveals that taking into account segment reassortment and using sequencing data from several viral segments for joint phylodynamic inference leads to different estimates for effective population sizes, migration, and clock rates. This new model is implemented as the Structured Coalescent with Reassortment package for BEAST 2.5 and is available at https://github.com/jugne/SCORE.  相似文献   

14.
Restriction site-associated DNA sequencing (RAD-seq) and related methods have become relatively common approaches to resolve species-level phylogeny. It is not clear, however, whether RAD-seq data matrices are well suited to relaxed clock inference of divergence times, given the size of the matrices and the abundance of missing data. We investigated the sensitivity of Bayesian relaxed clock estimates of divergence times to alternative analytical decisions on an empirical RAD-seq phylogenetic matrix. We explored the relative contribution of secondary calibration strategies, amount of missing data, and the data partition analyzed to overall variance in divergence times inferred using BEAST MCMC analyses of Carex section Schoenoxiphium (Cyperaceae)—a recent radiation for which we have nearly complete species sampling of RAD-seq data. The crown node for Schoenoxiphium was estimated to be 15.22 (9.56–21.18) Ma using a single calibration point and low missing data, 11.93 (8.07–16.03) Ma using multiple calibration points and low missing data, and 8.34 (5.41–11.22) using multiple calibrations but high missing data. We found that using matrices with more than half of the individuals with missing data inferred younger mean ages for all nodes. Moreover, we have found that our molecular clock estimates are sensitive to the positions of the calibration(s) in our phylogenetic tree (using matrices with low missing data), especially when only a single calibration was applied to estimate divergence times. These results argue for sensitivity analyses and caution in interpreting divergence time estimates from RAD-seq data.  相似文献   

15.
Quantifying epidemiological dynamics is crucial for understanding and forecasting the spread of an epidemic. The coalescent and the birth-death model are used interchangeably to infer epidemiological parameters from the genealogical relationships of the pathogen population under study, which in turn are inferred from the pathogen genetic sequencing data. To compare the performance of these widely applied models, we performed a simulation study. We simulated phylogenetic trees under the constant rate birth-death model and the coalescent model with a deterministic exponentially growing infected population. For each tree, we re-estimated the epidemiological parameters using both a birth-death and a coalescent based method, implemented as an MCMC procedure in BEAST v2.0. In our analyses that estimate the growth rate of an epidemic based on simulated birth-death trees, the point estimates such as the maximum a posteriori/maximum likelihood estimates are not very different. However, the estimates of uncertainty are very different. The birth-death model had a higher coverage than the coalescent model, i.e. contained the true value in the highest posterior density (HPD) interval more often (2–13% vs. 31–75% error). The coverage of the coalescent decreases with decreasing basic reproductive ratio and increasing sampling probability of infecteds. We hypothesize that the biases in the coalescent are due to the assumption of deterministic rather than stochastic population size changes. Both methods performed reasonably well when analyzing trees simulated under the coalescent. The methods can also identify other key epidemiological parameters as long as one of the parameters is fixed to its true value. In summary, when using genetic data to estimate epidemic dynamics, our results suggest that the birth-death method will be less sensitive to population fluctuations of early outbreaks than the coalescent method that assumes a deterministic exponentially growing infected population.  相似文献   

16.
More loci/partitions should improve Bayesian estimation of divergence times on phylogenies but it has recently been shown that this can lead to surprisingly poor estimation due to the way it affects the prior on mean substitution rate. Here we consider the likely impact of partition number on divergence time analyses carried out using the program BEAST. Mitochondrial genome data from toad‐headed lizards (genus Phrynocephalus) from the Qinghai–Tibetan Plateau were used to examine this effect. Under increased partitioning of the sequences, BEAST posterior divergence times became unreasonably narrow and downwardly biased due to misspecification of the mean substitution rate prior. This effect was detectable when relatively few partitions were used (i.e. between four and eight), but became very acute for 27–86 partitions. Fortunately, a correction that adjusts the standard deviation of the mean of locus rates led to results that were equivalent to those obtained using the latest version of the program MCMCtree, which implements a new gamma‐Dirichlet prior to overcome this problem. A review of the literature shows that a substantial number of BEAST dating studies are likely to have been affected by this misspecification of the rate prior.  相似文献   

17.
Understanding the dynamics of white-nose syndrome spread in time and space is an important component for the disease epidemiology and control. We reported earlier that a novel partitivirus, Pseudogymnoascus destructans partitivirus-pa, had infected the North American isolates of Pseudogymnoascus destructans, the fungal pathogen that causes white-nose syndrome in bats. We showed that the diversity of the viral coat protein sequences is correlated to their geographical origin. Here we hypothesize that the geographical adaptation of the virus could be used as a proxy to characterize the spread of white-nose syndrome. We used over 100 virus isolates from diverse locations in North America and applied the phylogeographic analysis tool BEAST to characterize the spread of the disease. The strict clock phylogeographic analysis under the coalescent model in BEAST showed a patchy spread pattern of white-nose syndrome driven from a few source locations including Connecticut, New York, West Virginia, and Kentucky. The source states had significant support in the maximum clade credibility tree and Bayesian stochastic search variable selection analysis. Although the geographic origin of the virus is not definite, it is likely the virus infected the fungus prior to the spread of white-nose syndrome in North America. We also inferred from the BEAST analysis that the recent long-distance spread of the fungus to Washington had its root in Kentucky, likely from the Mammoth cave area and most probably mediated by a human. The time to the most recent common ancestor of the virus is estimated somewhere between the late 1990s to early 2000s. We found the mean substitution rate of 2 X 10−3 substitutions per site per year for the virus which is higher than expected given the persistent lifestyle of the virus, and the stamping-machine mode of replication. Our approach of using the virus as a proxy to understand the spread of white-nose syndrome could be an important tool for the study and management of other infectious diseases.  相似文献   

18.
Tick-borne flaviviruses (TBF) are widely dispersed across Africa, Europe, Asia, Oceania, and North America, and some present a significant threat to human health. Seminal studies on tick-borne encephalitis viruses (TBEV), based on partial envelope gene sequences, predicted a westward clinal pattern of evolution and dispersal across northern Eurasia, terminating in the British Isles. We tested this hypothesis using all available full-length open reading frame (ORF) TBF sequences. Phylogenetic analysis was consistent with current reports. However, linear and nonlinear regression analysis of genetic versus geographic distance combined with BEAST analysis identified two separate clines, suggesting that TBEV spread both east and west from a central point. In addition, BEAST analysis suggested that TBF emerged and dispersed more than 16,000 years ago, significantly earlier than previously predicted. Thus, climatic and ecological changes may have played a greater role in TBF dispersal than humans.  相似文献   

19.
Here, we introduce the idea of probabilities of line origins for alleles in general pedigrees as found in crosses between outbred lines. We also present software for calculating these probabilities. The proposed algorithm is based on the linear regression method of Haley, Knott and Elsen (1994) combined with the Markov chain Monte Carlo (MCMC) method for estimating quantitative trait locus coefficients used as regressors. We compared the relative precision of our method and the original method as proposed by Haley et al. (1994). The scenarios studied varied in the allelic distribution of marker alleles in parental lines and in the frequency of missing marker genotypes. We found that the MCMC method achieves a higher accuracy in all scenarios considered. The benefits of using MCMC approximation are substantial if the frequency of missing marker data is high or the number of marker alleles is low and the allelic frequency distribution is similar in both parental lines.  相似文献   

20.
Accurate species delimitation is critical for biodiversity studies. However, species complexes characterized by introgression, high levels of population structure and subtle phenotypic differentiation can be challenging to delimit. Here, we report on a molecular systematic investigation of the woodland salamanders Plethodon wehrlei and Plethodon punctatus, which traditionally have been placed in the Plethodon wehrlei species group. To quantify patterns of genetic variation, we collected genetic samples from throughout the range of both species, including 22 individuals from nine populations of P. punctatus, and 60 individuals from 26 populations of P. wehrlei. From these samples, we sequenced three mtDNA loci (5596 base pairs) and five nuclear loci (3377 base pairs). We inferred time‐calibrated gene trees and species trees using BEAST 2.4.6, and we delimited putative species using a Bayesian implementation of the general mixed Yule‐coalescent model (bGMYC) and STRUCTURE. Finally, we validated putative species using the multispecies coalescent as implemented in Bayesian Phylogenetics and Phylogeography (BPP). We found substantial phylogeographic diversity in P. wehrlei, including multiple geographically cohesive clades and an inferred mitochondrial common ancestor at 11.5 myr (95% HPD: 9.6–13.6 myr) that separated populations formerly assigned to P. dixi from all other populations. We also found that P. punctatus is deeply nested within P. wehrlei, rendering the latter paraphyletic. After discussing the challenges faced by modern species delimitation methods, we recommend retaining P. punctatus because it is ecologically and phenotypically distinct. We further recommend that P. dixi be recognized as a valid species.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号