首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Simultaneous molecular dating of population and species divergences is essential in many biological investigations, including phylogeography, phylodynamics and species delimitation studies. In these investigations, multiple sequence alignments consist of both intra‐ and interspecies samples (mixed samples). As a result, the phylogenetic trees contain interspecies, interpopulation and within‐population divergences. Bayesian relaxed clock methods are often employed in these analyses, but they assume the same tree prior for both inter‐ and intraspecies branching processes and require specification of a clock model for branch rates (independent vs. autocorrelated rates models). We evaluated the impact of a single tree prior on Bayesian divergence time estimates by analysing computer‐simulated data sets. We also examined the effect of the assumption of independence of evolutionary rate variation among branches when the branch rates are autocorrelated. Bayesian approach with coalescent tree priors generally produced excellent molecular dates and highest posterior densities with high coverage probabilities. We also evaluated the performance of a non‐Bayesian method, RelTime, which does not require the specification of a tree prior or a clock model. RelTime's performance was similar to that of the Bayesian approach, suggesting that it is also suitable to analyse data sets containing both populations and species variation when its computational efficiency is needed.  相似文献   

2.
Analyses of a comprehensive morphological character matrix of mammals using ‘relaxed’ clock models (which simultaneously estimate topology, divergence dates and evolutionary rates), either alone or in combination with an 8.5 kb nuclear sequence dataset, retrieve implausibly ancient, Late Jurassic–Early Cretaceous estimates for the initial diversification of Placentalia (crown-group Eutheria). These dates are much older than all recent molecular and palaeontological estimates. They are recovered using two very different clock models, and regardless of whether the tree topology is freely estimated or constrained using scaffolds to match the current consensus placental phylogeny. This raises the possibility that divergence dates have been overestimated in previous analyses that have applied such clock models to morphological and total evidence datasets. Enforcing additional age constraints on selected internal divergences results in only a slight reduction of the age of Placentalia. Constraining Placentalia to less than 93.8 Ma, congruent with recent molecular estimates, does not require major changes in morphological or molecular evolutionary rates. Even constraining Placentalia to less than 66 Ma to match the ‘explosive’ palaeontological model results in only a 10- to 20-fold increase in maximum evolutionary rate for morphology, and fivefold for molecules. The large discrepancies between clock- and fossil-based estimates for divergence dates might therefore be attributable to relatively small changes in evolutionary rates through time, although other explanations (such as overly simplistic models of morphological evolution) need to be investigated. Conversely, dates inferred using relaxed clock models (especially with discrete morphological data and MrBayes) should be treated cautiously, as relatively minor deviations in rate patterns can generate large effects on estimated divergence dates.  相似文献   

3.
Evolutionary timescales can be estimated from genetic data using phylogenetic methods based on the molecular clock. To account for molecular rate variation among lineages, a number of relaxed‐clock models have been developed. Some of these models assume that rates vary among lineages in an autocorrelated manner, so that closely related species share similar rates. In contrast, uncorrelated relaxed clocks allow all of the branch‐specific rates to be drawn from a single distribution, without assuming any correlation between rates along neighbouring branches. There is uncertainty about which of these two classes of relaxed‐clock models are more appropriate for biological data. We present an R package, NELSI, that allows the evolution of DNA sequences to be simulated according to a range of clock models. Using data generated by this package, we assessed the ability of two Bayesian phylogenetic methods to distinguish among different relaxed‐clock models and to quantify rate variation among lineages. The results of our analyses show that rate autocorrelation is typically difficult to detect, even when there is complete taxon sampling. This provides a potential explanation for past failures to detect rate autocorrelation in a range of data sets.  相似文献   

4.
We introduce a new model for relaxing the assumption of a strict molecular clock for use as a prior in Bayesian methods for divergence time estimation. Lineage-specific rates of substitution are modeled using a Dirichlet process prior (DPP), a type of stochastic process that assumes lineages of a phylogenetic tree are distributed into distinct rate classes. Under the Dirichlet process, the number of rate classes, assignment of branches to rate classes, and the rate value associated with each class are treated as random variables. The performance of this model was evaluated by conducting analyses on data sets simulated under a range of different models. We compared the Dirichlet process model with two alternative models for rate variation: the strict molecular clock and the independent rates model. Our results show that divergence time estimation under the DPP provides robust estimates of node ages and branch rates without significantly reducing power. Further analyses were conducted on a biological data set, and we provide examples of ways to summarize Markov chain Monte Carlo samples under this model.  相似文献   

5.
Relaxed phylogenetics and dating with confidence   总被引:3,自引:1,他引:2       下载免费PDF全文
In phylogenetics, the unrooted model of phylogeny and the strict molecular clock model are two extremes of a continuum. Despite their dominance in phylogenetic inference, it is evident that both are biologically unrealistic and that the real evolutionary process lies between these two extremes. Fortunately, intermediate models employing relaxed molecular clocks have been described. These models open the gate to a new field of “relaxed phylogenetics.” Here we introduce a new approach to performing relaxed phylogenetic analysis. We describe how it can be used to estimate phylogenies and divergence times in the face of uncertainty in evolutionary rates and calibration times. Our approach also provides a means for measuring the clocklikeness of datasets and comparing this measure between different genes and phylogenies. We find no significant rate autocorrelation among branches in three large datasets, suggesting that autocorrelated models are not necessarily suitable for these data. In addition, we place these datasets on the continuum of clocklikeness between a strict molecular clock and the alternative unrooted extreme. Finally, we present analyses of 102 bacterial, 106 yeast, 61 plant, 99 metazoan, and 500 primate alignments. From these we conclude that our method is phylogenetically more accurate and precise than the traditional unrooted model while adding the ability to infer a timescale to evolution.  相似文献   

6.
The molecular clock, i.e., constancy of the rate of evolution over time, is commonly assumed in estimating divergence dates. However, this assumption is often violated and has drastic effects on date estimation. Recently, a number of attempts have been made to relax the clock assumption. One approach is to use maximum likelihood, which assigns rates to branches and allows the estimation of both rates and times. An alternative is the Bayes approach, which models the change of the rate over time. A number of models of rate change have been proposed. We have extended and evaluated models of rate evolution, i.e., the lognormal and its recent variant, along with the gamma, the exponential, and the Ornstein-Uhlenbeck processes. These models were first applied to a small hominoid data set, where an empirical Bayes approach was used to estimate the hyperparameters that measure the amount of rate variation. Estimation of divergence times was sensitive to these hyperparameters, especially when the assumed model is close to the clock assumption. The rate and date estimates varied little from model to model, although the posterior Bayes factor indicated the Ornstein-Uhlenbeck process outperformed the other models. To demonstrate the importance of allowing for rate change across lineages, this general approach was used to analyze a larger data set consisting of the 18S ribosomal RNA gene of 39 metazoan species. We obtained date estimates consistent with paleontological records, the deepest split within the group being about 560 million years ago. Estimates of the rates were in accordance with the Cambrian explosion hypothesis and suggested some more recent lineage-specific bursts of evolution.  相似文献   

7.
High-throughput sequencing enables rapid genome sequencing during infectious disease outbreaks and provides an opportunity to quantify the evolutionary dynamics of pathogens in near real-time. One difficulty of undertaking evolutionary analyses over short timescales is the dependency of the inferred evolutionary parameters on the timespan of observation. Crucially, there are an increasing number of molecular clock analyses using external evolutionary rate priors to infer evolutionary parameters. However, it is not clear which rate prior is appropriate for a given time window of observation due to the time-dependent nature of evolutionary rate estimates. Here, we characterize the molecular evolutionary dynamics of SARS-CoV-2 and 2009 pandemic H1N1 (pH1N1) influenza during the first 12 months of their respective pandemics. We use Bayesian phylogenetic methods to estimate the dates of emergence, evolutionary rates, and growth rates of SARS-CoV-2 and pH1N1 over time and investigate how varying sampling window and data set sizes affect the accuracy of parameter estimation. We further use a generalized McDonald–Kreitman test to estimate the number of segregating nonneutral sites over time. We find that the inferred evolutionary parameters for both pandemics are time dependent, and that the inferred rates of SARS-CoV-2 and pH1N1 decline by ∼50% and ∼100%, respectively, over the course of 1 year. After at least 4 months since the start of sequence sampling, inferred growth rates and emergence dates remain relatively stable and can be inferred reliably using a logistic growth coalescent model. We show that the time dependency of the mean substitution rate is due to elevated substitution rates at terminal branches which are 2–4 times higher than those of internal branches for both viruses. The elevated rate at terminal branches is strongly correlated with an increasing number of segregating nonneutral sites, demonstrating the role of purifying selection in generating the time dependency of evolutionary parameters during pandemics.  相似文献   

8.
We describe a procedure for model averaging of relaxed molecular clock models in Bayesian phylogenetics. Our approach allows us to model the distribution of rates of substitution across branches, averaged over a set of models, rather than conditioned on a single model. We implement this procedure and test it on simulated data to show that our method can accurately recover the true underlying distribution of rates. We applied the method to a set of alignments taken from a data set of 12 mammalian species and uncovered evidence that lognormally distributed rates better describe this data set than do exponentially distributed rates. Additionally, our implementation of model averaging permits accurate calculation of the Bayes factor(s) between two or more relaxed molecular clock models. Finally, we introduce a new computational approach for sampling rates of substitution across branches that improves the convergence of our Markov chain Monte Carlo algorithms in this context. Our methods are implemented under the BEAST 1.6 software package, available at http://beast-mcmc.googlecode.com.  相似文献   

9.
Accurate inference of the dates of common ancestry among species forms a central problem in understanding the evolutionary history of organisms. Molecular estimates of divergence time rely on the molecular evolutionary prediction that neutral mutations and substitutions occur at the same constant rate in genomes of related species. This underlies the notion of a molecular clock. Most implementations of this idea depend on paleontological calibration to infer dates of common ancestry, but taxa with poor fossil records must rely on external, potentially inappropriate, calibration with distantly related species. The classic biological models Caenorhabditis and Drosophila are examples of such problem taxa. Here, I illustrate internal calibration in these groups with direct estimates of the mutation rate from contemporary populations that are corrected for interfering effects of selection on the assumption of neutrality of substitutions. Divergence times are inferred among 6 species each of Caenorhabditis and Drosophila, based on thousands of orthologous groups of genes. I propose that the 2 closest known species of Caenorhabditis shared a common ancestor <24 MYA (Caenorhabditis briggsae and Caenorhabditis sp. 5) and that Caenorhabditis elegans diverged from its closest known relatives <30 MYA, assuming that these species pass through at least 6 generations per year; these estimates are much more recent than reported previously with molecular clock calibrations from non-nematode phyla. Dates inferred for the common ancestor of Drosophila melanogaster and Drosophila simulans are roughly concordant with previous studies. These revised dates have important implications for rates of genome evolution and the origin of self-fertilization in Caenorhabditis.  相似文献   

10.
Dating evolutionary origins of taxa is essential for understanding rates and timing of evolutionary events, often inciting intense debate when molecular estimates differ from first fossil appearances. For numerous reasons, ostracods present a challenging case study of rates of evolution and congruence of fossil and molecular divergence time estimates. On the one hand, ostracods have one of the densest fossil records of any metazoan group. However, taxonomy of fossil ostracods is controversial, owing at least in part to homoplasy of carapaces, the most commonly fossilized part. In addition, rates of evolution are variable in ostracods. Here, we report evidence of extreme variation in the rate of molecular evolution in different ostracod groups. This rate is significantly elevated in Halocyprid ostracods, a widespread planktonic group, consistent with previous observations that planktonic groups show elevated rates of molecular evolution. At the same time, the rate of molecular evolution is slow in the lineage leading to Manawa staceyi, a relict species that we estimate diverged approximately 500 million years ago from its closest known living relative. We also report multiple cases of significant incongruence between fossil and molecular estimates of divergence times in Ostracoda. Although relaxed clock methods improve the congruence of fossil and molecular divergence estimates over strict clock models, incongruence is present regardless of method. We hypothesize that this observed incongruence is driven largely by problems with taxonomy of fossil Ostracoda. Our results illustrate the difficulty in consistently estimating lineage divergence times, even in the presence of a voluminous fossil record.  相似文献   

11.

Background  

Although current molecular clock methods offer greater flexibility in modelling evolutionary events, calibration of the clock with dates from the fossil record is still problematic for many groups. Here we implement several new approaches in molecular dating to estimate the evolutionary ages of Lacertidae, an Old World family of lizards with a poor fossil record and uncertain phylogeny. Four different models of rate variation are tested in a new program for Bayesian phylogenetic analysis called TreeTime, based on a combination of mitochondrial and nuclear gene sequences. We incorporate paleontological uncertainty into divergence estimates by expressing multiple calibration dates as a range of probabilistic distributions. We also test the reliability of our proposed calibrations by exploring effects of individual priors on posterior estimates.  相似文献   

12.
The molecular clock presents a means of estimating evolutionary rates and timescales using genetic data. These estimates can lead to important insights into evolutionary processes and mechanisms, as well as providing a framework for further biological analyses. To deal with rate variation among genes and among lineages, a diverse range of molecular‐clock methods have been developed. These methods have been implemented in various software packages and differ in their statistical properties, ability to handle different models of rate variation, capacity to incorporate various forms of calibrating information and tractability for analysing large data sets. Choosing a suitable molecular‐clock model can be a challenging exercise, but a number of model‐selection techniques are available. In this review, we describe the different forms of evolutionary rate heterogeneity and explain how they can be accommodated in molecular‐clock analyses. We provide an outline of the various clock methods and models that are available, including the strict clock, local clocks, discrete clocks and relaxed clocks. Techniques for calibration and clock‐model selection are also described, along with methods for handling multilocus data sets. We conclude our review with some comments about the future of molecular clocks.  相似文献   

13.
Simple models of molecular evolution assume that sequences evolve by a Poisson process in which nucleotide or amino acid substitutions occur as rare independent events. In these models, the expected ratio of the variance to the mean of substitution counts equals 1, and substitution processes with a ratio greater than 1 are called overdispersed. Comparing the genomes of 10 closely related species of Drosophila, we extend earlier evidence for overdispersion in amino acid replacements as well as in four-fold synonymous substitutions. The observed deviation from the Poisson expectation can be described as a linear function of the rate at which substitutions occur on a phylogeny, which implies that deviations from the Poisson expectation arise from gene-specific temporal variation in substitution rates. Amino acid sequences show greater temporal variation in substitution rates than do four-fold synonymous sequences. Our findings provide a general phenomenological framework for understanding overdispersion in the molecular clock. Also, the presence of substantial variation in gene-specific substitution rates has broad implications for work in phylogeny reconstruction and evolutionary rate estimation.  相似文献   

14.
Accurate and precise estimation of divergence times during the Neo-Proterozoic is necessary to understand the speciation dynamic of early Eukaryotes. However such deep divergences are difficult to date, as the molecular clock is seriously violated. Recent improvements in Bayesian molecular dating techniques allow the relaxation of the molecular clock hypothesis as well as incorporation of multiple and flexible fossil calibrations. Divergence times can then be estimated even when the evolutionary rate varies among lineages and even when the fossil calibrations involve substantial uncertainties. In this paper, we used a Bayesian method to estimate divergence times in Foraminifera, a group of unicellular eukaryotes, known for their excellent fossil record but also for the high evolutionary rates of their genomes. Based on multigene data we reconstructed the phylogeny of Foraminifera and dated their origin and the major radiation events. Our estimates suggest that Foraminifera emerged during the Cryogenian (650-920 Ma, Neo-Proterozoic), with a mean time around 770 Ma, about 220 Myr before the first appearance of reliable foraminiferal fossils in sediments (545 Ma). Most dates are in agreement with the fossil record, but in general our results suggest earlier origins of foraminiferal orders. We found that the posterior time estimates were robust to specifications of the prior. Our results highlight inter-species variations of evolutionary rates in Foraminifera. Their effect was partially overcome by using the partitioned Bayesian analysis to accommodate rate heterogeneity among data partitions and using the relaxed molecular clock to account for changing evolutionary rates. However, more coding genes appear necessary to obtain more precise estimates of divergence times and to resolve the conflicts between fossil and molecular date estimates.  相似文献   

15.
Rate heterogeneity among lineages is a common feature of molecular evolution, and it has long impeded our ability to accurately estimate the age of evolutionary divergence events. The development of relaxed molecular clocks, which model variable substitution rates among lineages, was intended to rectify this problem. Major subtypes of pandemic HIV-1 group M are thought to exemplify closely related lineages with different substitution rates. Here, we report that inferring the time of most recent common ancestor of all these subtypes in a single phylogeny under a single (relaxed) molecular clock produces significantly different dates for many of the subtypes than does analysis of each subtype on its own. We explore various methods to ameliorate this problem. We conclude that current molecular dating methods are inadequate for dealing with this type of substitution rate variation in HIV-1. Through simulation, we show that heterotachy causes root ages to be overestimated.  相似文献   

16.
Accurate estimates of mitochondrial substitution rates are central to molecular studies of human evolution, but meaningful comparisons of published studies are problematic because of the wide range of methodologies and data sets employed. These differences are nowhere more pronounced than among rates estimated from phylogenies, genealogies, and pedigrees. By using a data set comprising mitochondrial genomes from 177 humans, we estimate substitution rates for various data partitions by using Bayesian phylogenetic analysis with a relaxed molecular clock. We compare the effect of multiple internal calibrations with the customary human-chimpanzee split. The analyses reveal wide variation among estimated substitution rates and divergence times made with different partitions and calibrations, with evidence of substitutional saturation, natural selection, and significant rate heterogeneity among lineages and among sites. Collectively, the results support dates for migration out of Africa and the common mitochondrial ancestor of humans that are considerably more recent than most previous estimates. Our results also demonstrate that human mitochondrial genomes exhibit a number of molecular evolutionary complexities that necessitate the use of sophisticated analytical models for genetic analyses.  相似文献   

17.
The age of the angiosperms: a molecular timescale without a clock   总被引:8,自引:0,他引:8  
The age of the angiosperms has long been of interest to botanists and evolutionary biologists. Many early efforts to date the age of the angiosperms and evolutionary divergences within the angiosperm clade using a molecular clock have yielded age estimates that are grossly inconsistent with the fossil record. We investigated the age of angiosperms using Bayesian relaxed clock (BRC) and penalized likelihood (PL) approaches. Both of these methods allow the incorporation of multiple fossil constraints into the optimization procedure. The BRC method allows a range of values for among-lineage rate of substitution, from a nearly clocklike behavior to a condition in which each branch is allowed an optimal substitution rate, and also accounts for variation in molecular evolution across multiple genes. A topology derived from an analysis of genes from all three plant genomes for 71 taxa was used as a backbone. The effects on age estimates of different genes, single-gene versus concatenated datasets, and the inclusion and assumptions of fossils as age constraints were examined. In addition, the influence of prior distributions on estimates of divergence times was also explored. These results indicate that widely divergent age estimates can result from the different methods (198-139 million years ago), different sources of data (275-122 million years ago), and the inclusion of temporal constraints to topologies. Most dates, however, are between 180-140 million years ago, suggesting a Middle Jurassic-Early Cretaceous origin of flowering plants, predating the oldest unequivocal fossil angiosperms by about 45-5 million years. Nonetheless, these dates are consistent with other recent studies that have used methods that relax the assumption of a strict molecular clock and also agree with the hypothesis that the angiosperms may be somewhat older than the fossil record indicates.  相似文献   

18.
A model-based approach for detecting coevolving positions in a molecule   总被引:4,自引:0,他引:4  
We present a new method for detecting coevolving sites in molecules. The method relies on a set of aligned sequences (nucleic acid or protein) and uses Markov models of evolution to map the substitutions that occurred at each site onto the branches of the underlying phylogenetic tree. This mapping takes into account the uncertainty over ancestral states and among-site rate variation. We then build, for each site, a "substitution vector" containing the posterior estimates of the number of substitutions in each branch. The amount of coevolution for a pair of sites is then measured as the Pearson correlation coefficient between the two corresponding substitution vectors and compared to the expectation under the null hypothesis of independence. We applied the method to a 79-species bacterial ribosomal RNA data set, for which extensive structural characterization has been done over the last 30 years. More than 95% of the intramolecular predicted pairs of sites correspond to known interacting site pairs.  相似文献   

19.
Molecular sequences do not only allow the reconstruction of phylogenetic relationships among species, but also provide information on the approximate divergence times. Whereas the fossil record dates the origin of most multicellular animal phyla during the Cambrian explosion less than 540 million years ago(mya), molecular clock calculations usually suggest much older dates. Here we used a large multiple sequence alignment derived from Expressed Sequence Tags and genomes comprising 129genes (37,476 amino acid positions) and 117 taxa, including 101 arthropods. We obtained consistent divergence time estimates applying relaxed Bayesian clock models with different priors and multiple calibration points. While the influence of substitution rates, missing data, and model priors were negligible, the clock model had significant effect. A log-normal autocorrelated model was selected on basis of cross-validation. We calculated that arthropods emerged ~600 mya. Onychophorans (velvet worms) and euarthropods split ~590 mya, Pancrustacea and Myriochelata ~560 mya, Myriapoda and Chelicerata ~555 mya, and 'Crustacea' and Hexapoda ~510 mya. Endopterygote insects appeared ~390 mya. These dates are considerably younger than most previous molecular clock estimates and in better agreement with the fossil record. Nevertheless, a Precambrian origin of arthropods and other metazoan phyla is still supported. Our results also demonstrate the applicability of large datasets of random nuclear sequences for approximating the timing of multicellular animal evolution.  相似文献   

20.
The choice of an "optimal" mathematical model for computing evolutionary distances from real sequences is not currently supported by easy-to-use software applicable to large data sets, and an investigator frequently selects one of the simplest models available. Here we study properties of the observed proportion of differences (p- distance) between sequences as an estimator of evolutionary distance for tree-making. We show that p-distances allow for consistent tree- making with any of the popular methods working with evolutionary distances if evolution of sequences obeys a "molecular clock" (more precisely, if it follows a stationary time-reversible Markov model of nucleotide substitution). Next, we show that p-distances seem to be efficient in recovering the correct tree topology under a "molecular clock," but produce "statistically supported" wrong trees when substitutions rates vary among evolutionary lineages. Finally, we outline a practical approach for selecting an "optimal" model of nucleotide substitution in a real data analysis, and obtain a crude estimate of a "prior" distribution of the expected tree branch lengths under the Jukes-Cantor model. We conclude that the use of a model that is obviously oversimplified is inadvisable unless it is justified by a preliminary analysis of the real sequences.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号