首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Divergence time and substitution rate are seriously confounded in phylogenetic analysis, making it difficult to estimate divergence times when the molecular clock (rate constancy among lineages) is violated. This problem can be alleviated to some extent by analyzing multiple gene loci simultaneously and by using multiple calibration points. While different genes may have different patterns of evolutionary rate change, they share the same divergence times. Indeed, the fact that each gene may violate the molecular clock differently leads to the advantage of simultaneous analysis of multiple loci. Multiple calibration points provide the means for characterizing the local evolutionary rates on the phylogeny. In this paper, we extend previous likelihood models of local molecular clock for estimating species divergence times to accommodate multiple calibration points and multiple genes. Heterogeneity among different genes in evolutionary rate and in substitution process is accounted for by the models. We apply the likelihood models to analyze two mitochondrial protein-coding genes, cytochrome oxidase II and cytochrome b, to estimate divergence times of Malagasy mouse lemurs and related outgroups. The likelihood method is compared with the Bayes method of Thorne et al. (1998, Mol. Biol. Evol. 15:1647-1657), which uses a probabilistic model to describe the change in evolutionary rate over time and uses the Markov chain Monte Carlo procedure to derive the posterior distribution of rates and times. Our likelihood implementation has the drawbacks of failing to accommodate uncertainties in fossil calibrations and of requiring the researcher to classify branches on the tree into different rate groups. Both problems are avoided in the Bayes method. Despite the differences in the two methods, however, data partitions and model assumptions had the greatest impact on date estimation. The three codon positions have very different substitution rates and evolutionary dynamics, and assumptions in the substitution model affect date estimation in both likelihood and Bayes analyses. The results demonstrate that the separate analysis is unreliable, with dates variable among codon positions and between methods, and that the combined analysis is much more reliable. When the three codon positions were analyzed simultaneously under the most realistic models using all available calibration information, the two methods produced similar results. The divergence of the mouse lemurs is dated to be around 7-10 million years ago, indicating a surprisingly early species radiation for such a morphologically uniform group of primates.  相似文献   

2.
Although still controversial, estimation of divergence times using molecular data has emerged as a powerful tool to examine the tempo and mode of evolutionary change. Two primary obstacles in improving the accuracy of molecular dating are heterogeneity in DNA substitution rates and accuracy of the fossil record as calibration points. Recent methodological advances have provided powerful methods that estimate relative divergence times in the face of heterogeneity of nucleotide substitution rates among lineages. However, relatively little attention has focused on the accuracy of fossil calibration points that allow one to translate relative divergence times into absolute time. We present a new cross-validation method that identifies inconsistent fossils when multiple fossil calibrations are available for a clade and apply our method to a molecular phylogeny of living turtles with fossil calibration times for 17 of the 22 internal nodes in the tree. Our cross-validation procedure identified seven inconsistent fossils. Using the consistent fossils as calibration points, we found that despite their overall antiquity as a lineage, the most species-rich clades of turtles diversified well within the Cenozoic. Many of the truly ancient lineages of turtles are currently represented by a few, often endangered species that deserve high priority as conservation targets.  相似文献   

3.
Molecular dating of phylogenetic trees is a growing discipline using sequence data to co‐estimate the timing of evolutionary events and rates of molecular evolution. All molecular‐dating methods require converting genetic divergence between sequences into absolute time. Historically, this could only be achieved by associating externally derived dates obtained from fossil or biogeographical evidence to internal nodes of the tree. In some cases, notably for fast‐evolving genomes such as viruses and some bacteria, the time span over which samples were collected may cover a significant proportion of the time since they last shared a common ancestor. This situation allows phylogenetic trees to be calibrated by associating sampling dates directly to the sequences representing the tips (terminal nodes) of the tree. The increasing availability of genomic data from ancient DNA extends the applicability of such tip‐based calibration to a variety of taxa including humans, extinct megafauna and various microorganisms which typically have a scarce fossil record. The development of statistical models accounting for heterogeneity in different aspects of the evolutionary process while accommodating very large data sets (e.g. whole genomes) has allowed using tip‐dating methods to reach inferences on divergence times, substitution rates, past demography or the age of specific mutations on a variety of spatiotemporal scales. In this review, we summarize the current state of the art of tip dating, discuss some recent applications, highlight common pitfalls and provide a ‘how to’ guide to thoroughly perform such analyses.  相似文献   

4.
Few estimates of relative substitution rates, and the underlying mutation rates, exist between mitochondrial and nuclear genes in insects. Previous estimates for insects indicate a 2-9 times faster substitution rate in mitochondrial genes relative to nuclear genes. Here we use novel methods for estimating relative rates of substitution, which incorporate multiple substitutions, and apply these methods to a group of insects (lice, Order: Phthiraptera). First, we use a modification of copath analysis (branch length regression) to construct independent comparisons of rates, consisting of each branch in a phylogenetic tree. The branch length comparisons use maximum likelihood models to correct for multiple substitution. In addition, we estimate codon-specific rates under maximum likelihood for the different genes and compare these values. Estimates of the relative synonymous substitution rates between a mitochondrial (COI) and nuclear (EF-1alpha) gene in lice indicate a relative rate of several 100 to 1. This rapid relative mitochondrial rate (>100 times) is at least an order of magnitude faster than previous estimates for any group of organisms. Comparisons using the same methods for another group of insects (aphids) reveals that this extreme relative rate estimate is not simply attributable to the methods we used, because estimates from aphids are substantially lower. Taxon sampling affects the relative rate estimate, with comparisons involving more closely related taxa resulting in a higher estimate. Relative rate estimates also increase with model complexity, indicating that methods accounting for more multiple substitution estimate higher relative rates.  相似文献   

5.
C F Arias  S Lpez    R T Espejo 《Journal of virology》1986,57(3):1207-1209
The nucleotide sequences for several complementary DNA clones of the rotavirus genome were determined. When the sequences obtained from different clones for the same regions (16,000 bases) were compared, differences in eight base positions were observed. These discrepancies, approximately 1 in 2,000 bases, may be due to differences in individual RNA genomes resulting from multiple passages; infidelity of DNA synthesis in the cloning procedure; or both factors. Whatever the cause, this frequency of base substitution found in sequences of complementary DNA obtained from the same isolate should be considered when comparing DNA sequences obtained from independent isolates. On the other hand, the frequency of base changes observed suggests that the rotavirus genome is very conserved since the virus used for cDNA synthesis has been continuously passaged for 6 years without plaque purification.  相似文献   

6.
Molecular phylogenies are increasingly being used to investigate the patterns and mechanisms of macroevolution. In particular, node heights in a phylogeny can be used to detect changes in rates of diversification over time. Such analyses rest on the assumption that node heights in a phylogeny represent the timing of diversification events, which in turn rests on the assumption that evolutionary time can be accurately predicted from DNA sequence divergence. But there are many influences on the rate of molecular evolution, which might also influence node heights in molecular phylogenies, and thus affect estimates of diversification rate. In particular, a growing number of studies have revealed an association between the net diversification rate estimated from phylogenies and the rate of molecular evolution. Such an association might, by influencing the relative position of node heights, systematically bias estimates of diversification time. We simulated the evolution of DNA sequences under several scenarios where rates of diversification and molecular evolution vary through time, including models where diversification and molecular evolutionary rates are linked. We show that commonly used methods, including metric‐based, likelihood and Bayesian approaches, can have a low power to identify changes in diversification rate when molecular substitution rates vary. Furthermore, the association between the rates of speciation and molecular evolution rate can cause the signature of a slowdown or speedup in speciation rates to be lost or misidentified. These results suggest that the multiple sources of variation in molecular evolutionary rates need to be considered when inferring macroevolutionary processes from phylogenies.  相似文献   

7.
McGuire G  Prentice MJ  Wright F 《Biometrics》1999,55(4):1064-1070
The genetic distance between two DNA sequences may be measured by the average number of nucleotide substitutions per position that has occurred since the two sequences diverged from a common ancestor. Estimates of this quantity can be derived from Markov models for the substitution process, while the variances are estimated using the delta method and confidence intervals calculated assuming normality. However, when the sampling distribution of the estimator deviates from normality, such intervals will not be accurate. For simple one-parameter models of nucleotide substitution, we propose a transformation of normal confidence intervals, which yields an almost exact approximation to the true confidence intervals of the distance estimators. To calculate confidence intervals for more complicated models, we propose the saddlepoint approximation. A simulation study shows that the saddlepoint-derived confidence intervals are a real improvement over existing methods.  相似文献   

8.
Precise dating of viral subtype divergence enables researchers to correlate divergence with geographic and demographic occurrences. When historical data are absent (that is, the overwhelming majority), viral sequence sampling on a time scale commensurate with the rate of substitution permits the inference of the times of subtype divergence. Currently, researchers use two strategies to approach this task, both requiring strong conditions on the molecular clock assumption of substitution rate. As the underlying structure of the substitution rate process at the time of subtype divergence is not understood and likely highly variable, we present a simple method that estimates rates of substitution, and from there, times of divergence, without use of an assumed molecular clock. We accomplish this by blending estimates of the substitution rate for triplets of dated sequences where each sequence draws from a distinct viral subtype, providing a zeroth-order approximation for the rate between subtypes. As an example, we calculate the time of divergence for three genes among influenza subtypes A-H3N2 and B using subtype C as an outgroup. We show a time of divergence approximately 100 years ago, substantially more recent than previous estimates which range from 250 to 3800 years ago.  相似文献   

9.
Neutral evolution is the simplest model of molecular evolution and thus it is most amenable to a comprehensive theoretical investigation. In this paper, we characterize the statistical properties of neutral evolution of proteins under the requirement that the native state remains thermodynamically stable, and compare them to the ones of Kimura's model of neutral evolution. Our study is based on the Structurally Constrained Neutral (SCN) model which we recently proposed. We show that, in the SCN model, the substitution rate decreases as longer time intervals are considered. Fluctuations from one branch of the evolutionary tree to another are strong, leading to a non-Poissonian statistics for the substitution process. Such strong fluctuations are in part due to the fact that neutral substitution rates for individual residues are strongly correlated for most residue pairs. Interestingly, structurally conserved residues, characterized by a much below average substitution rate, are also much less correlated to other residues and evolve in a much more regular way. Our results can improve methods aimed at distinguishing between neutral and adaptive substitutions as well as methods for computing the expected number of substitutions occurred since the divergence of two protein sequences. In particular, we compute the minimal sequence similarity below which no information about the evolutionary divergence of the compared sequences can be obtained.  相似文献   

10.
Simple Methods for Testing the Molecular Evolutionary Clock Hypothesis   总被引:44,自引:3,他引:41       下载免费PDF全文
F. Tajima 《Genetics》1993,135(2):599-607
Simple statistical methods for testing the molecular evolutionary clock hypothesis are developed which can be applied to both nucleotide and amino acid sequences. These methods are based on the chi-square test and are applicable even when the pattern of substitution rates is unknown and/or the substitution rate varies among different sites. Furthermore, some of the methods can be applied even when the outgroup is unknown. Using computer simulations, these methods were compared with the likelihood ratio test and the relative rate test. The results indicate that the powers of the present methods are similar to those of the likelihood ratio test and the relative rate test, in spite of the fact that the latter two tests assume that the pattern of substitution rates follows a certain model and that the substitution rate is the same among different sites, while such assumptions are not necessary to apply the present methods. Therefore, the present methods might be useful.  相似文献   

11.
Estimating the Variability of Substitution Rates   总被引:6,自引:3,他引:3       下载免费PDF全文
M. Bulmer 《Genetics》1989,123(3):615-619
Suppose that amino acid or nucleotide data are available for a homologous gene in several species which diverged from a common ancestor at about the same time and that substitution rates between all pairs of species are calculated, correcting as necessary for multiple substitutions and for back and parallel substitutions. The variances and covariances of these corrected substitution rates are evaluated, and are used to construct a new test for uniformity (constancy of the molecular clock) and to find the best estimates of substitution rates in individual lineages with their standard errors. A substantial bias may arise if the effect of correcting the pairwise substitution rates is ignored.  相似文献   

12.
MOTIVATION: Heterochronous gene sequence data is important for characterizing the evolutionary processes of fast-evolving organisms such as RNA viruses. A limited set of algorithms exists for estimating the rate of nucleotide substitution and inferring phylogenetic trees from such data. The authors here present a new method, Tree and Rate Estimation by Local Evaluation (TREBLE) that robustly calculates the rate of nucleotide substitution and phylogeny with several orders of magnitude improvement in computational time. METHODS: For the basis of its rate estimation TREBLE novelly utilizes a geometric interpretation of the molecular clock assumption to deduce a local estimate of the rate of nucleotide substitution for triplets of dated sequences. Averaging the triplet estimates via a variance weighting yields a global estimate of the rate. From this value, an iterative refinement procedure relying on statistical properties of the triplets then generates a final estimate of the global rate of nucleotide substitution. The estimated global rate is then utilized to find the tree from the pairwise distance matrix via an UPGMA-like algorithm. RESULTS: Simulation studies show that TREBLE estimates the rate of nucleotide substitution with point estimates comparable with the best of available methods. Confidence intervals are comparable with that of BEAST. TREBLE's phylogenetic reconstruction is significantly improved over the other distance matrix method but not as accurate as the Bayesian algorithm. Compared with three other algorithms, TREBLE reduces computational time by a minimum factor of 3000. Relative to the algorithm with the most accurate estimates for the rate of nucleotide substitution (i.e. BEAST), TREBLE is over 10,000 times more computationally efficient. AVAILABILITY: jdobrien.bol.ucla.edu/TREBLE.html  相似文献   

13.
We describe a procedure for model averaging of relaxed molecular clock models in Bayesian phylogenetics. Our approach allows us to model the distribution of rates of substitution across branches, averaged over a set of models, rather than conditioned on a single model. We implement this procedure and test it on simulated data to show that our method can accurately recover the true underlying distribution of rates. We applied the method to a set of alignments taken from a data set of 12 mammalian species and uncovered evidence that lognormally distributed rates better describe this data set than do exponentially distributed rates. Additionally, our implementation of model averaging permits accurate calculation of the Bayes factor(s) between two or more relaxed molecular clock models. Finally, we introduce a new computational approach for sampling rates of substitution across branches that improves the convergence of our Markov chain Monte Carlo algorithms in this context. Our methods are implemented under the BEAST 1.6 software package, available at http://beast-mcmc.googlecode.com.  相似文献   

14.
We report the presence of four nuclear paralogs of a 380-bp segment of cytochrome b in callitrichine primates (marmosets and tamarins). The mitochondrial cytochrome b sequence and each nuclear paralog were obtained from several species, allowing multiple comparisons of rates and patterns of substitution both between mitochondrial and nuclear sequences and among nuclear sequences. The mitochondrial DNA had high overall rates of molecular evolution and a strong bias toward substitutions at third codon positions. Rates of molecular evolution among the nuclear sequences were low and constant, and there were small differences in substitution patterns among the nuclear clades which were probably attributable to the small number of sites involved. A novel method of phylogenetic reconstruction based on the large difference in rates of evolution at different codon positions among mitochondrial and nuclear clades was used to determine whether different nuclear paralogs represent independent transposition events or duplications following a single insertion. This method is generally applicable in cases where differences in pattern of molecular evolution are known, and it showed that at least three of the four nuclear clades represent independent transposition events. The insertion events giving rise to two of the nuclear clades predate the divergence of the callitrichines, whereas those leading to the other two nuclear clades may have occurred in the common ancestor of marmosets.  相似文献   

15.
This article reviews the most common methods used today for estimating divergence times and rates of molecular evolution. The methods are grouped into three main classes: (1) methods that use a molecular clock and one global rate of substitution, (2) methods that correct for rate heterogeneity, and (3) methods that try to incorporate rate heterogeneity. Additionally, links to the most important literature on molecular dating are given, including articles comparing the performance of different methods, papers that investigate problems related to taxon, gene and partition sampling, and literature discussing highly debated issues like calibration strategies and uncertainties, dating precision and the calculation of error estimates.  相似文献   

16.
ZihengYANG 《动物学报》2004,50(4):645-656
众所周知 ,物种分化年代的估计对分子钟 (进化速率恒定 )假定很敏感。另一方面 ,在远缘物种 (例如哺乳纲不同目的动物 )的比较中 ,分子钟几乎总是不成立的。这样在估计分化时间时考虑不同进化区系的速率差异至为重要。最大似然法可以很自然地考虑这种速率差异 ,并且可以同时分析多个基因位点的资料以及同时利用多重化石校正数据。以前提出的似然法需要研究者将进化树的树枝按速率分组 ,本文提出一个近似方法以使这个过程自动化。本方法综合了以前的似然法、贝斯法及近似速率平滑法的一些特征。此外 ,还对算法加以改进 ,以适应综合数据分析时某些基因在某些物种中缺乏资料的情形。应用新提出的方法来分析马达加斯加的倭狐猴的分化年代 ,并与以前的似然法及贝斯法的分析进行了比较  相似文献   

17.
Inferring speciation times under an episodic molecular clock   总被引:5,自引:0,他引:5  
We extend our recently developed Markov chain Monte Carlo algorithm for Bayesian estimation of species divergence times to allow variable evolutionary rates among lineages. The method can use heterogeneous data from multiple gene loci and accommodate multiple fossil calibrations. Uncertainties in fossil calibrations are described using flexible statistical distributions. The prior for divergence times for nodes lacking fossil calibrations is specified by use of a birth-death process with species sampling. The prior for lineage-specific substitution rates is specified using either a model with autocorrelated rates among adjacent lineages (based on a geometric Brownian motion model of rate drift) or a model with independent rates among lineages specified by a log-normal probability distribution. We develop an infinite-sites theory, which predicts that when the amount of sequence data approaches infinity, the width of the posterior credibility interval and the posterior mean of divergence times form a perfect linear relationship, with the slope indicating uncertainties in time estimates that cannot be reduced by sequence data alone. Simulations are used to study the influence of among-lineage rate variation and the number of loci sampled on the uncertainty of divergence time estimates. The analysis suggests that posterior time estimates typically involve considerable uncertainties even with an infinite amount of sequence data, and that the reliability and precision of fossil calibrations are critically important to divergence time estimation. We apply our new algorithms to two empirical data sets and compare the results with those obtained in previous Bayesian and likelihood analyses. The results demonstrate the utility of our new algorithms.  相似文献   

18.
Molecular sequences do not only allow the reconstruction of phylogenetic relationships among species, but also provide information on the approximate divergence times. Whereas the fossil record dates the origin of most multicellular animal phyla during the Cambrian explosion less than 540 million years ago(mya), molecular clock calculations usually suggest much older dates. Here we used a large multiple sequence alignment derived from Expressed Sequence Tags and genomes comprising 129genes (37,476 amino acid positions) and 117 taxa, including 101 arthropods. We obtained consistent divergence time estimates applying relaxed Bayesian clock models with different priors and multiple calibration points. While the influence of substitution rates, missing data, and model priors were negligible, the clock model had significant effect. A log-normal autocorrelated model was selected on basis of cross-validation. We calculated that arthropods emerged ~600 mya. Onychophorans (velvet worms) and euarthropods split ~590 mya, Pancrustacea and Myriochelata ~560 mya, Myriapoda and Chelicerata ~555 mya, and 'Crustacea' and Hexapoda ~510 mya. Endopterygote insects appeared ~390 mya. These dates are considerably younger than most previous molecular clock estimates and in better agreement with the fossil record. Nevertheless, a Precambrian origin of arthropods and other metazoan phyla is still supported. Our results also demonstrate the applicability of large datasets of random nuclear sequences for approximating the timing of multicellular animal evolution.  相似文献   

19.
Heterochronous data sets comprise molecular sequences sampled at different points in time. If the temporal range of the sampled sequences is large relative to the rate of mutation, the sampling times can directly calibrate evolutionary rates to calendar time. Here, we extend this calibration process to provide a full probabilistic method that utilizes temporal information in heterochronous data sets to estimate sampling times (leaf-ages) for sequenced for which this information unavailable. Our method is similar to relaxing the constraints of the molecular clock on specific lineages within a phylogenetic tree. Using a combination of synthetic and empirical data sets, we demonstrate that the method estimates leaf-ages reliably and accurately. Potential applications of our approach include incorporating samples of uncertain or radiocarbon-infinite age into ancient DNA analyses, evaluating the temporal signal in a particular sequence or data set, and exploring the reliability of sequence ages that are somehow contentious.  相似文献   

20.
A series of new results useful to the study of DNA sequences using Markov models of substitution are presented with proofs. General time-reversible distances can be extended to accommodate any fixed distribution of rates across sites by replacing the logarithmic function of a matrix with the inverse of a moment generating function. Estimators are presented assuming a gamma distribution, the inverse Gaussian distribution, or a mixture of either of these with invariant sites. Also considered are the different ways invariant sites may be removed and how these differences may affect estimated distances. Through collaboration, we implemented these distances into PAUP* in 1994. The variance of these new distances is approximated via the delta method. It is also shown how to predict the divergence expected for a pair of sequences given a rate matrix and a distribution of rates across sites, allowing iterated ML estimates of distances under any reversible model. A simple test of whether a rate matrix is time reversible is also presented. These new methods are used to estimate the divergence time of humans and chimps from mtDNA sequence data. These analyses support suggestions that the human lineage has an enhanced transition rate relative to other hominoids. These studies also show that transversion distances differ substantially from the overall distances which are dominated by transitions. Transversions alone apparently suggest a very recent divergence time for humans versus chimps and/or a very old (>16 myr) divergence time for humans versus organgutans. This work illustrates graphically ways to interpret the reliability of distance-based transformations, using the corrected transition to transversion ratio returned for pairs of sequences which are successively more diverged.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号