首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A compound poisson process for relaxing the molecular clock   总被引:18,自引:0,他引:18  
Huelsenbeck JP  Larget B  Swofford D 《Genetics》2000,154(4):1879-1892
The molecular clock hypothesis remains an important conceptual and analytical tool in evolutionary biology despite the repeated observation that the clock hypothesis does not perfectly explain observed DNA sequence variation. We introduce a parametric model that relaxes the molecular clock by allowing rates to vary across lineages according to a compound Poisson process. Events of substitution rate change are placed onto a phylogenetic tree according to a Poisson process. When an event of substitution rate change occurs, the current rate of substitution is modified by a gamma-distributed random variable. Parameters of the model can be estimated using Bayesian inference. We use Markov chain Monte Carlo integration to evaluate the posterior probability distribution because the posterior probability involves high dimensional integrals and summations. Specifically, we use the Metropolis-Hastings-Green algorithm with 11 different move types to evaluate the posterior distribution. We demonstrate the method by analyzing a complete mtDNA sequence data set from 23 mammals. The model presented here has several potential advantages over other models that have been proposed to relax the clock because it is parametric and does not assume that rates change only at speciation events. This model should prove useful for estimating divergence times when substitution rates vary across lineages.  相似文献   

2.
If substitutions in DNA sequences follow a Poisson process, the ratio of the variance in the number of substitutions to the mean number of substitutions (the index of dispersion) should equal 1. In this paper, the robustness of the commonly applied estimator of the index of dispersion in replacement sites and silent sites to various assumptions regarding DNA evolution is explored using simulation methods. The estimate of the index of dispersion may be strongly biased if the assumptions of the model of substitution are violated. However, the results of this study support the conclusions of studies by Gillespie and Ohta that the process of substitution in replacement sites is overdispersed. This result contradicts those of a recent study and shows that the high index of dispersion for replacement sites is not an artifact caused by the method of estimation.  相似文献   

3.
Distinguishing noise from signal presents a problem when DNA sequences are used for phylogeny reconstruction. Multiple substitutions at sites are a primary cause of noise and this is compounded by variation in substitution rates among sites. For protein-coding genes, one method used to determine if data are noisy is to assess levels of saturation of substitutions by codon position. However, this procedure may not be a fine enough filter for assessing noise. Variation in substitution rates may also be caused by constraints on change imposed by the function of the protein product. Using a structural model of the cytochromebprotein as a template, I divided cytbsequence data for species within the avian family Falconidae (falcons and caracaras) into three functional domains. Saturation of substitutions of sequences within these regions was assessed graphically. This qualitative determination of saturation was then used to differentially weight phylogenetic analysis, resulting in an hypothesis congruent with existing cladistic analyses and traditional morphology. These results demonstrate that saturation of substitutions is correlated with functional regions of cytochromeband that using this information improves phylogenetic inference.  相似文献   

4.
Natural selection and the molecular clock   总被引:13,自引:1,他引:12  
  相似文献   

5.
Many tests of the lineage dependence of substitution rates, computations of the error of evolutionary distances, and simulations of molecular evolution assume that the rate of evolution is constant in time within each lineage descended from a common ancestor. However, estimates of the index of dispersion of numbers of mammalian substitutions suggest that the rate has time-dependent variations consistent with a fractal-Gaussian-rate Poisson process, which assumes common descent without assuming rate constancy. While this model does not affect certain relative-rate tests, it substantially increases the uncertainty of branch lengths. Thus, fluctuations in the rate of substitution cannot be neglected in calculations that rely on evolutionary distances, such as the confidence intervals of divergence times and certain phylogenetic reconstructions. The fractal-Gaussian-rate Poisson process is compared and contrasted with previous models of molecular evolution, including other Poisson processes, the fractal renewal process, a Lévy-stable process, a fractional-difference process, and a log-Brownian process. The fractal models are more compatible with mammalian data than the nonfractal models considered, and they may also be better supported by Darwinian theory. Although the fractal-Gaussian-rate Poisson process has not been proven to have better agreement with data or theory than the other fractal models, its Gaussian nature simplifies the exploration of its impact on evolutionary distance errors and relative-rate tests. Received: 29 September 1999 / Accepted: 20 January 2000  相似文献   

6.
The relative efficiencies of the maximum parsimony (MP) and distance-matrix methods in obtaining the correct tree (topology) were studied by using computer simulation. The distance-matrix methods examined are the neighbor-joining, distance-Wagner, Tateno et al. modified Farris, Faith, and Li methods. In the computer simulation, six or eight DNA sequences were assumed to evolve following a given model tree, and the evolutionary changes of the sequences were followed. Both constant and varying rates of nucleotide substitution were considered. From the sequences thus obtained, phylogenetic trees were constructed using the six tree-making methods and compared with the model (true) tree. This process was repeated 300 times for each different set of parameters. The results obtained indicate that when the number of nucleotide substitutions per site is small and a relatively small number of nucleotides are used, the probability of obtaining the correct topology (P1) is generally lower in the MP method than in the distance-matrix methods. The P1 value for the MP method increases with increasing number of nucleotides but is still generally lower than the value for the NJ or DW method. Essentially the same conclusion was obtained whether or not the rate of nucleotide substitution was constant or whether or not a transition bias in nucleotide substitution existed. The relatively poor performance of the MP method for these cases is due to the fact that information from singular sites is not used in this method. The MP method also showed a relatively low P1 value when the model of varying rate of nucleotide substitution was used and the number of substitutions per site was large. However, the MP method often produced cases in which the correct tree was one of several equally parsimonious trees. When these cases were included in the class of "success," the MP method performed better than the other methods, provided that the number of nucleotide substitutions per site was small.  相似文献   

7.
Three Markov models (Dayhoff, Proportional and Poisson models; Hasegawa et al., 1992a) for amino acid substitution during evolution were used for maximum likelihood analyses of proteins coded for in mitochondrial DNA in estimating a phylogenetic tree among human, bovine and murids (mouse and rat) with chicken as an outgroup. It turned out that Dayhoff model is the most appropriate model among the alternatives in approximating the amino acid substitutions of proteins coded for in mitochondrial DNA. In spite of the presence of the complete sequence data of mitochondrial genomes, we could not resolve the trichotomy among human, bovine and murids, probably because the time length separating two branching events among these three lines was short and because chicken is too distant from mammals to be used as an outgroup. It was suggested that the average substitution rate of amino acids coded for in mitochondrial DNA is lower along the bovine line than those along the human or murid lines. Advantages of amino acid sequence analysis over nucleotide sequence analysis in phylogenetic study were discussed.  相似文献   

8.
A fundamental task in sequence analysis is to calculate the probability of a multiple alignment given a phylogenetic tree relating the sequences and an evolutionary model describing how sequences change over time. However, the most widely used phylogenetic models only account for residue substitution events. We describe a probabilistic model of a multiple sequence alignment that accounts for insertion and deletion events in addition to substitutions, given a phylogenetic tree, using a rate matrix augmented by the gap character. Starting from a continuous Markov process, we construct a non-reversible generative (birth-death) evolutionary model for insertions and deletions. The model assumes that insertion and deletion events occur one residue at a time. We apply this model to phylogenetic tree inference by extending the program dnaml in phylip. Using standard benchmarking methods on simulated data and a new "concordance test" benchmark on real ribosomal RNA alignments, we show that the extended program dnamlepsilon improves accuracy relative to the usual approach of ignoring gaps, while retaining the computational efficiency of the Felsenstein peeling algorithm.  相似文献   

9.
10.
ABSTRACT: BACKGROUND: A number of software packages are available to generate DNA multiple sequence alignments (MSAs) evolved under continuous-time Markov processes on phylogenetic trees. On the other hand, methods of simulating the DNA MSA directly from the transition matrices do not exist. Moreover, existing software restricts to the time-reversible models and it is not optimized to generate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages). RESULTS: We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices. Based on the input model and a phylogenetic tree in the Newick format (with branch lengths measured as the expected number of substitutions per site), the algorithm produces DNA alignments of desired length. GenNon-h is publicly available for download. CONCLUSION: The software presented here is an efficient tool to generate DNA MSAs on a given phylogenetic tree. GenNon-h provides the user with the nonstationary or nonhomogeneous phylogenetic data that is well suited for testing complex biological hypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models.  相似文献   

11.
On the Overdispersed Molecular Clock   总被引:16,自引:8,他引:8       下载免费PDF全文
Naoyuki Takahata 《Genetics》1987,116(1):169-179
Rates of molecular evolution at some loci are more irregular than described by simple Poisson processes. Three situations under which molecular evolution would not follow simple Poisson processes are reevaluated from the viewpoint of the neutrality hypothesis: concomitant or multiple substitutions in a gene, fluctuating substitution rates in time caused by coupled effects of deleterious mutations and bottlenecks, and changes in the degree of selective constraints against a gene (neutral space) caused by successive substitutions. The common underlying assumption that these causes are lineage nonspecific excludes the case where mutation rates themselves change systematically among lineages or taxonomic groups, and severely limits the extent of variation in the number of substitutions among lineages. Even under this stringent condition, however, the third hypothesis, the fluctuating neutral space model, can generate fairly large variation. This is described by a time-dependent renewal process, which does not exhibit any episodic nature of molecular evolution. It is argued that the observed elevated variances in the number of nucleotide or amino acid substitutions do not immediately call for positive Darwinian selection in molecular evolution.  相似文献   

12.
Simple models of molecular evolution assume that sequences evolve by a Poisson process in which nucleotide or amino acid substitutions occur as rare independent events. In these models, the expected ratio of the variance to the mean of substitution counts equals 1, and substitution processes with a ratio greater than 1 are called overdispersed. Comparing the genomes of 10 closely related species of Drosophila, we extend earlier evidence for overdispersion in amino acid replacements as well as in four-fold synonymous substitutions. The observed deviation from the Poisson expectation can be described as a linear function of the rate at which substitutions occur on a phylogeny, which implies that deviations from the Poisson expectation arise from gene-specific temporal variation in substitution rates. Amino acid sequences show greater temporal variation in substitution rates than do four-fold synonymous sequences. Our findings provide a general phenomenological framework for understanding overdispersion in the molecular clock. Also, the presence of substantial variation in gene-specific substitution rates has broad implications for work in phylogeny reconstruction and evolutionary rate estimation.  相似文献   

13.
Most molecular phylogenetic studies of vertebrates have been based on DNA sequences of mitochondrial-encoded genes. MtDNA evolves rapidly and is thus particularly useful for resolving relationships among recently evolved groups. However, it has the disadvantage that all of the mitochondrial genes are inherited as a single linkage group so that only one independent gene tree can be inferred regardless of the number of genes sequenced. Introns of nuclear genes are attractive candidates for independent sources of rapidly evolving DNA: they are pervasive, most of their nucleotides appear to be unconstrained by selection, and PCR primers can be designed for sequences in adjacent exons where nucleotide sequences are conserved. We sequenced intron 7 of the beta-fibrinogen gene (beta-fibint7) for a diversity of woodpeckers and compared the phylogenetic signal and nucleotide substitution properties of this DNA sequence with that of mitochondrial-encoded cytochrome b (cyt b) from a previous study. A few indels (insertions and deletions) were found in the beta-fibint7 sequences, but alignment was not difficult, and the indels were phylogentically informative. The beta-fibint7 and cyt b gene trees were nearly identical to each other but differed in significant ways from the traditional woodpecker classification. Cyt b evolves 2.8 times as fast as beta-fibint7 (14. 0 times as fast at third codon positions). Despite its relatively slow substitution rate, the phylogenetic signal in beta-fibint7 is comparable to that in cyt b for woodpeckers, because beta-fibint7 has less base composition bias and more uniform nucleotide substitution probabilities. As a consequence, compared with cyt b, beta-fibint7 nucleotide sites are expected to enter more distinct character states over the course of evolution and have fewer multiple substitutions and lower levels of homoplasy. Moreover, in contrast to cyt b, in which nearly two thirds of nucleotide sites rarely vary among closely related taxa, virtually all beta-fibint7 nucleotide sites appear free of selective constraints, which increases informative sites per unit sequenced. However, the estimated gamma distribution used to model rate variation among sites suggests constraints on some beta-fibint7 sites. This study suggests that introns will be useful for phylogenetic studies of recently evolved groups.  相似文献   

14.
We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.  相似文献   

15.
Accuracy of estimated phylogenetic trees from molecular data   总被引:2,自引:0,他引:2  
Summary The accuracies and efficiencies of four different methods for constructing phylogenetic trees from molecular data were examined by using computer simulation. The methods examined are UPGMA, Fitch and Margoliash's (1967) (F/M) method, Farris' (1972) method, and the modified Farris method (Tateno, Nei, and Tajima, this paper). In the computer simulation, eight OTUs (32 OTUs in one case) were assumed to evolve according to a given model tree, and the evolutionary change of a sequence of 300 nucleotides was followed. The nucleotide substitution in this sequence was assumed to occur following the Poisson distribution, negative binomial distribution or a model of temporally varying rate. Estimates of nucleotide substitutions (genetic distances) were then computed for all pairs of the nucleotide sequences that were generated at the end of the evolution considered, and from these estimates a phylogenetic tree was reconstructed and compared with the true model tree. The results of this comparison indicate that when the coefficient of variation of branch length is large the Farris and modified Farris methods tend to be better than UPGMA and the F/M method for obtaining a good topology. For estimating the number of nucleotide substitutions for each branch of the tree, however, the modified Farris method shows a better performance than the Farris method. When the coefficient of variation of branch length is small, however, UPGMA shows the best performance among the four methods examined. Nevertheless, any tree-making method is likely to make errors in obtaining the correct topology with a high probability, unless all branch lengths of the true tree are sufficiently long. It is also shown that the agreement between patristic and observed genetic distances is not a good indicator of the goodness of the tree obtained.  相似文献   

16.
Specificity of mutations induced in transfected DNA by mammalian cells   总被引:29,自引:1,他引:28       下载免费PDF全文
DNA transfected into mammalian cells is subject to the high mutation frequency of approximately 1% per gene. We present data bearing on the derivation of the two main classes of mutations detected, base substitutions and deletions. The DNA sequence change is reported for nearly 100 independent base substitution mutations that occurred in shuttle vectors as a result of passage in simian cells. All of the mutations occur at G:C base pairs and involve either transition to A:T or transversion to T:A. To identify possible mutational intermediates, various topological forms of the vector DNA were introduced separately. Supercoiled and relaxed DNA are mutated at equal frequencies. However, linearized DNA leads to a greatly elevated frequency of deletions. Nicked and gapped templates stimulate both deletions and base substitutions. We discuss a model involving intracellular degradation of the transfected DNA which explains these observations.  相似文献   

17.
The relative efficiencies of the maximum-parsimony (MP), UPGMA, and neighbor-joining (NJ) methods in obtaining the correct tree (topology) for restriction-site and restriction-fragment data were studied by computer simulation. In this simulation, six DNA sequences of 16,000 nucleotides were assumed to evolve following a given model tree. The recognition sequences of 20 different six-base restriction enzymes were used to identify the restriction sites of the DNA sequences generated. The restriction-site data and restriction-fragment data thus obtained were used to reconstruct a phylogenetic tree, and the tree obtained was compared with the model tree. This process was repeated 300 times. The results obtained indicate that when the rate of nucleotide substitution is constant the probability of obtaining the correct tree (Pc) is generally higher in the NJ method than in the MP method. However, if we use the average topological deviation from the model tree (dT) as the criterion of comparison, the NJ and MP methods are nearly equally efficient. When the rate of nucleotide substitution varies with evolutionary lineage, the NJ method is better than the MP method, whether Pc or dT is used as the criterion of comparison. With 500 nucleotides and when the number of nucleotide substitutions per site was very small, restriction-site data were, contrary to our expectation, more useful than sequence data. Restriction-fragment data were less useful than restriction-site data, except when the sequence divergence was very small. UPGMA seems to be useful only when the rate of nucleotide substitution is constant and sequence divergence is high.  相似文献   

18.
There are three different methods of estimating the number of nucleotide substitutions between a pair of species from amino acid sequence data, i.e. the Poisson correction method, random evolutionary hit method, and counting the actual but minimum number of nucleotide substitutions. In this paper the relationships among the estimates obtained by these methods are studied empirically. The results obtained indicate that there is a high correlation among these estimates and in practice any of the three methods may be used for constructing evolutionary trees or relating nucleotide substitutions to evolutionary time. The effects of varying rates of nucleotide substition among different sites on the Poisson correction and random evolutionary hit methods are also studied mathematically. It is shown that these two methods are quite insensitive to the variation of the rate of nucleotide substitution.  相似文献   

19.
The molecular clock of mitochondrial DNA has been extensively used to date various genetic events. However, its substitution rate among humans appears to be higher than rates inferred from human-chimpanzee comparisons, limiting the potential of interspecies clock calibrations for intraspecific dating. It is not well understood how and why the substitution rate accelerates. We have analyzed a phylogenetic tree of 3057 publicly available human mitochondrial DNA coding region sequences for changes in the ratios of mutations belonging to different functional classes. The proportion of non-synonymous and RNA genes substitutions has reduced over hundreds of thousands of years. The highest mutation ratios corresponding to fast acceleration in the apparent substitution rate of the coding sequence have occurred after the end of the Last Ice Age. We recalibrate the molecular clock of human mtDNA as 7990 years per synonymous mutation over the mitochondrial genome. However, the distribution of substitutions at synonymous sites in human data significantly departs from a model assuming a single rate parameter and implies at least 3 different subclasses of sites. Neutral model with 3 synonymous substitution rates can explain most, if not all, of the apparent molecular clock difference between the intra- and interspecies levels. Our findings imply the sluggishness of purifying selection in removing the slightly deleterious mutations from the human as well as the Neandertal and chimpanzee populations. However, for humans, the weakness of purifying selection has been further exacerbated by the population expansions associated with the out-of Africa migration and the end of the Last Ice Age.  相似文献   

20.
A codon-based model of nucleotide substitution for protein-coding DNA sequences   总被引:34,自引:23,他引:11  
A codon-based model for the evolution of protein-coding DNA sequences is presented for use in phylogenetic estimation. A Markov process is used to describe substitutions between codons. Transition/transversion rate bias and codon usage bias are allowed in the model, and selective restraints at the protein level are accommodated using physicochemical distances between the amino acids coded for by the codons. Analyses of two data sets suggest that the new codon-based model can provide a better fit to data than can nucleotide-based models and can produce more reliable estimates of certain biologically important measures such as the transition/transversion rate ratio and the synonymous/nonsynonymous substitution rate ratio.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号