首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Divergence time and substitution rate are seriously confounded in phylogenetic analysis, making it difficult to estimate divergence times when the molecular clock (rate constancy among lineages) is violated. This problem can be alleviated to some extent by analyzing multiple gene loci simultaneously and by using multiple calibration points. While different genes may have different patterns of evolutionary rate change, they share the same divergence times. Indeed, the fact that each gene may violate the molecular clock differently leads to the advantage of simultaneous analysis of multiple loci. Multiple calibration points provide the means for characterizing the local evolutionary rates on the phylogeny. In this paper, we extend previous likelihood models of local molecular clock for estimating species divergence times to accommodate multiple calibration points and multiple genes. Heterogeneity among different genes in evolutionary rate and in substitution process is accounted for by the models. We apply the likelihood models to analyze two mitochondrial protein-coding genes, cytochrome oxidase II and cytochrome b, to estimate divergence times of Malagasy mouse lemurs and related outgroups. The likelihood method is compared with the Bayes method of Thorne et al. (1998, Mol. Biol. Evol. 15:1647-1657), which uses a probabilistic model to describe the change in evolutionary rate over time and uses the Markov chain Monte Carlo procedure to derive the posterior distribution of rates and times. Our likelihood implementation has the drawbacks of failing to accommodate uncertainties in fossil calibrations and of requiring the researcher to classify branches on the tree into different rate groups. Both problems are avoided in the Bayes method. Despite the differences in the two methods, however, data partitions and model assumptions had the greatest impact on date estimation. The three codon positions have very different substitution rates and evolutionary dynamics, and assumptions in the substitution model affect date estimation in both likelihood and Bayes analyses. The results demonstrate that the separate analysis is unreliable, with dates variable among codon positions and between methods, and that the combined analysis is much more reliable. When the three codon positions were analyzed simultaneously under the most realistic models using all available calibration information, the two methods produced similar results. The divergence of the mouse lemurs is dated to be around 7-10 million years ago, indicating a surprisingly early species radiation for such a morphologically uniform group of primates.  相似文献   

2.
Akashi H  Goel P  John A 《PloS one》2007,2(10):e1065
Reliable inference of ancestral sequences can be critical to identifying both patterns and causes of molecular evolution. Robustness of ancestral inference is often assumed among closely related species, but tests of this assumption have been limited. Here, we examine the performance of inference methods for data simulated under scenarios of codon bias evolution within the Drosophila melanogaster subgroup. Genome sequence data for multiple, closely related species within this subgroup make it an important system for studying molecular evolutionary genetics. The effects of asymmetric and lineage-specific substitution rates (i.e., varying levels of codon usage bias and departures from equilibrium) on the reliability of ancestral codon usage was investigated. Maximum parsimony inference, which has been widely employed in analyses of Drosophila codon bias evolution, was compared to an approach that attempts to account for uncertainty in ancestral inference by weighting ancestral reconstructions by their posterior probabilities. The latter approach employs maximum likelihood estimation of rate and base composition parameters. For equilibrium and most non-equilibrium scenarios that were investigated, the probabilistic method appears to generate reliable ancestral codon bias inferences for molecular evolutionary studies within the D. melanogaster subgroup. These reconstructions are more reliable than parsimony inference, especially when codon usage is strongly skewed. However, inference biases are considerable for both methods under particular departures from stationarity (i.e., when adaptive evolution is prevalent). Reliability of inference can be sensitive to branch lengths, asymmetry in substitution rates, and the locations and nature of lineage-specific processes within a gene tree. Inference reliability, even among closely related species, can be strongly affected by (potentially unknown) patterns of molecular evolution in lineages ancestral to those of interest.  相似文献   

3.
The strength and direction of selection on the identity of an amino acid residue in a protein is typically measured by the ratio of the rate of non-synonymous substitutions to the rate of synonymous substitutions. In attempting to predict positively selected sites from amino acid alignments, we made the unexpected observation that the site likelihood of an alignment column for a given tree tends to be negatively correlated with the posterior probability that site is in the positive selection class under widely-used codon models. This is likely because positively selected sites tend to be more variable and display more “radical” amino acid changes; both of these features are expected to result in low site log-likelihoods. We explored the efficacy of using the site log-likelihood (SLL) score as a predictor for positive selection. Through simulation we show that a SLL-based test has a low false positive rate and comparable power as the codon models. In one case where the simulated data violated the assumption that synonymous substitution rates were constant across the sites, the codon models were not able to detect positive selection in the data while the SLL test did. We applied the new method to ten empirical datasets and found that it made similar predictions as the codon models in eight of them. For the tax gene dataset the SLL test seemed to produce more reasonable results. The SLL methods are a valuable complement to codon models, especially for some cases where the assumptions of codon models are likely violated.  相似文献   

4.
Ren F  Tanaka H  Yang Z 《Systematic biology》2005,54(5):808-818
Models of codon substitution have been commonly used to compare protein-coding DNA sequences and are particularly effective in detecting signals of natural selection acting on the protein. Their utility in reconstructing molecular phylogenies and in dating species divergences has not been explored. Codon models naturally accommodate synonymous and nonsynonymous substitutions, which occur at very different rates and may be informative for recent and ancient divergences, respectively. Thus codon models may be expected to make an efficient use of phylogenetic information in protein-coding DNA sequences. Here we applied codon models to 106 protein-coding genes from eight yeast species to reconstruct phylogenies using the maximum likelihood method, in comparison with nucleotide- and amino acid-based analyses. The results appeared to confirm that expectation. Nucleotide-based analysis, under simplistic substitution models, were efficient in recovering recent divergences whereas amino acid-based analysis performed better at recovering deep divergences. Codon models appeared to combine the advantages of amino acid and nucleotide data and had good performance at recovering both recent and deep divergences. Estimation of relative species divergence times using amino acid and codon models suggested that translation of gene sequences into proteins led to information loss of from 30% for deep nodes to 66% for recent nodes. Although computational burden makes codon models unfeasible for tree search in large data sets, we suggest that they may be useful for comparing candidate trees. Nucleotide models that accommodate the differences in evolutionary dynamics at the three codon positions also performed well, at much less computational cost. We discuss the relationship between a model's fit to data and its utility in phylogeny reconstruction and caution against use of overly complex substitution models.  相似文献   

5.
Rodrigue N  Lartillot N  Philippe H 《Genetics》2008,180(3):1579-1591
In 1994, Muse and Gaut (MG) and Goldman and Yang (GY) proposed evolutionary models that recognize the coding structure of the nucleotide sequences under study, by defining a Markovian substitution process with a state space consisting of the 61 sense codons (assuming the universal genetic code). Several variations and extensions to their models have since been proposed, but no general and flexible framework for contrasting the relative performance of alternative approaches has yet been applied. Here, we compute Bayes factors to evaluate the relative merit of several MG and GY styles of codon substitution models, including recent extensions acknowledging heterogeneous nonsynonymous rates across sites, as well as selective effects inducing uneven amino acid or codon preferences. Our results on three real data sets support a logical model construction following the MG formulation, allowing for a flexible account of global amino acid or codon preferences, while maintaining distinct parameters governing overall nucleotide propensities. Through posterior predictive checks, we highlight the importance of such a parameterization. Altogether, the framework presented here suggests a broad modeling project in the MG style, stressing the importance of combining and contrasting available model formulations and grounding developments in a sound probabilistic paradigm.  相似文献   

6.
Miyazawa S 《PloS one》2011,6(12):e28892
BACKGROUND: A mechanistic codon substitution model, in which each codon substitution rate is proportional to the product of a codon mutation rate and the average fixation probability depending on the type of amino acid replacement, has advantages over nucleotide, amino acid, and empirical codon substitution models in evolutionary analysis of protein-coding sequences. It can approximate a wide range of codon substitution processes. If no selection pressure on amino acids is taken into account, it will become equivalent to a nucleotide substitution model. If mutation rates are assumed not to depend on the codon type, then it will become essentially equivalent to an amino acid substitution model. Mutation at the nucleotide level and selection at the amino acid level can be separately evaluated. RESULTS: The present scheme for single nucleotide mutations is equivalent to the general time-reversible model, but multiple nucleotide changes in infinitesimal time are allowed. Selective constraints on the respective types of amino acid replacements are tailored to each gene in a linear function of a given estimate of selective constraints. Their good estimates are those calculated by maximizing the respective likelihoods of empirical amino acid or codon substitution frequency matrices. Akaike and Bayesian information criteria indicate that the present model performs far better than the other substitution models for all five phylogenetic trees of highly-divergent to highly-homologous sequences of chloroplast, mitochondrial, and nuclear genes. It is also shown that multiple nucleotide changes in infinitesimal time are significant in long branches, although they may be caused by compensatory substitutions or other mechanisms. The variation of selective constraint over sites fits the datasets significantly better than variable mutation rates, except for 10 slow-evolving nuclear genes of 10 mammals. An critical finding for phylogenetic analysis is that assuming variable mutation rates over sites lead to the overestimation of branch lengths.  相似文献   

7.
SUMMARY: BAli-Phy is a Bayesian posterior sampler that employs Markov chain Monte Carlo to explore the joint space of alignment and phylogeny given molecular sequence data. Simultaneous estimation eliminates bias toward inaccurate alignment guide-trees, employs more sophisticated substitution models during alignment and automatically utilizes information in shared insertion/deletions to help infer phylogenies. AVAILABILITY: Software is available for download at http://www.biomath.ucla.edu/msuchard/bali-phy.  相似文献   

8.
Although phylogenetic inference of protein-coding sequences continues to dominate the literature, few analyses incorporate evolutionary models that consider the genetic code. This problem is exacerbated by the exclusion of codon-based models from commonly employed model selection techniques, presumably due to the computational cost associated with codon models. We investigated an efficient alternative to standard nucleotide substitution models, in which codon position (CP) is incorporated into the model. We determined the most appropriate model for alignments of 177 RNA virus genes and 106 yeast genes, using 11 substitution models including one codon model and four CP models. The majority of analyzed gene alignments are best described by CP substitution models, rather than by standard nucleotide models, and without the computational cost of full codon models. These results have significant implications for phylogenetic inference of coding sequences as they make it clear that substitution models incorporating CPs not only are a computationally realistic alternative to standard models but may also frequently be statistically superior.  相似文献   

9.
The accurate analyses of massive amounts of data obtained through next‐generation sequencing depend on the selection of appropriate evolutionary models. Many plastid phylogenomic studies typically analyze plastome data as a single partition, or divided by a region, using a concatenate “supergene” approach. The effects of molecular evolutionary models and character partition strategies on plastome‐based phylogenies have generally been evaluated at higher taxonomic levels in green plants. Using plastome data from 32 species of Amphilophium, a genus of Neotropical lianas, we explored potential sources of topological incongruence with different plastid genome datasets and approaches. Specifically, we evaluated the effects of compositional heterogeneity, codon usage bias, positive selection, and incomplete lineage sorting as sources of systematic error (i.e., the recovery of well‐supported conflicting topologies). We compared different datasets (e.g., non‐coding regions, exons, and codon‐aligned and translated amino acids) using concatenated approaches under site‐heterogeneous and site‐homogeneous models, as well as multispecies coalescent (MSC) methods. We found incongruences in recovered phylogenetic relationships, which were mainly located in short internodes. The MSC and concatenated approaches recovered similar topologies. The analysis of GC content and codon usage bias indicated higher substitution rates and AT excess at the third codon positions, and we found evidence of positive selection in 3% of amino acid sites. There were no significant differences among species in site biochemical profiles. We argue that the selection of appropriate partition strategies and evolutionary models is important to increase accuracy in phylogenetic relationships, even when using plastome datasets, which is still the primarily used genome in plant phylogenetics.  相似文献   

10.
Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes.  相似文献   

11.
Most evolutionary tree estimation methods for DNA sequences ignore or inefficiently use the phylogenetic information contained within shared patterns of gaps. This is largely due to the computational difficulties in implementing models for insertions and deletions. A simple way to incorporate this information is to treat a gap as a fifth character (with the four nucleotides being the other four) and to incorporate it within a Markov model of nucleotide substitution. This idea has been dismissed in the past, since it treats a multiple-site insertion or deletion as a sequence of independent events rather than a single event. While this is true, we have found that under many circumstances it is better to incorporate gap information inadequately than to ignore it, at least for topology estimation. We propose an extension to a class of nucleotide substitution models to incorporate the gap character and show that, for data sets (both real and simulated) with short and medium gaps, these models do lead to effective use of the information contained within insertions and deletions. We also implement an ad hoc method in which the likelihood at columns containing multiple-site gaps is downweighted in order to avoid giving them undue influence. The precision of the estimated tree, assessed using Markov chain Monte Carlo techniques to find the posterior distribution over tree space, improves under these five-state models compared with standard methods which effectively ignore gaps.  相似文献   

12.
13.
X Liu  H Liu  W Guo  K Yu 《Gene》2012,509(1):136-141
Codon models are now widely used to draw evolutionary inferences from alignments of homologous sequence data. Incorporating physicochemical properties of amino acids into codon models, two novel codon substitution models describing the evolution of protein-coding DNA sequences are presented based on the similarity scores of amino acids. To describe substitutions between codons a continue-time Markov process is used. Transition/transversion rate bias and nonsynonymous codon usage bias are allowed in the models. In our implementation, the parameters are estimated by maximum-likelihood (ML) method as in previous studies. Furthermore, instantaneous mutations involving more than one nucleotide position of a codon are considered in the second model. Then the two suggested models are applied to five real data sets. The analytic results indicate that the new codon models considering physicochemical properties of amino acids can provide a better fit to the data comparing with existing codon models, and then produce more reliable estimates of certain biologically important measures than existing methods.  相似文献   

14.
Measuring evolutionary distances between DNA or protein sequences forms the basis of many applications in computational biology and evolutionary studies. Of particular interest are distances based on synonymous substitutions, since these substitutions are considered to be under very little selection pressure and therefore assumed to accumulate in an almost clock-like manner. SynPAM, the method presented here, allows the estimation of distances between coding DNA sequences based on synonymous codon substitutions. The problem of estimating an accurate distance from the observed substitution pattern is solved by maximum-likelihood with empirical codon substitution matrices employed for the underlying Markov model. Comparisons with established measures of synonymous distance indicate that SynPAM has less variance and yields useful results over a longer time range.  相似文献   

15.
Measuring evolutionary distances between DNA or protein sequences forms the basis of many applications in computational biology and evolutionary studies. Of particular interest are distances based on synonymous substitutions since these substitutions are considered to be under very little selection pressure and therefore assumed to accumulate in an almost clock-like manner. SynPAM, the method presented here, allows the estimation of distances between coding DNA sequences based on synonymous codon substitutions. The problem of estimating an accurate distance from the observed substitution pattern is solved by maximum likelihood with empirical codon substitution matrices employed for the underlying Markov model. Comparisons with established measures of synonymous distance indicate that SynPAM has less variance and yields useful results over a longer time range.  相似文献   

16.
In recent years, likelihood ratio tests (LRTs) based on DNA and protein sequence data have been proposed for testing various evolutionary hypotheses. Because conducting an LRT requires an evolutionary model of nucleotide or amino acid substitution, which is almost always unknown, it becomes important to investigate the robustness of LRTs to violations of assumptions of these evolutionary models. Computer simulation was used to examine performance of LRTs of the molecular clock, transition/transversion bias, and among-site rate variation under different substitution models. The results showed that when correct models are used, LRTs perform quite well even when the DNA sequences are as short as 300 nt. However, LRTs were found to be biased under incorrect models. The extent of bias varies considerably, depending on the hypotheses tested, the substitution models assumed, and the lengths of the sequences used, among other things. A preliminary simulation study also suggests that LRTs based on parametric bootstrapping may be more sensitive to substitution models than are standard LRTs. When an assumed substitution model is grossly wrong and a more realistic model is available, LRTs can often reject the wrong model; thus, the performance of LRTs may be improved by using a more appropriate model. On the other hand, many factors of molecular evolution have not been considered in any substitution models so far built, and the possibility of an influence of this negligence on LRTs is often overlooked. The dependence of LRTs on substitution models calls for caution in interpreting test results and highlights the importance of clarifying the substitution patterns of genes and proteins and building more realistic models.  相似文献   

17.
Many questions in evolutionary biology are best addressed by comparing traits in different species. Often such studies involve mapping characters on phylogenetic trees. Mapping characters on trees allows the nature, number, and timing of the transformations to be identified. The parsimony method is the only method available for mapping morphological characters on phylogenies. Although the parsimony method often makes reasonable reconstructions of the history of a character, it has a number of limitations. These limitations include the inability to consider more than a single change along a branch on a tree and the uncoupling of evolutionary time from amount of character change. We extended a method described by Nielsen (2002, Syst. Biol. 51:729-739) to the mapping of morphological characters under continuous-time Markov models and demonstrate here the utility of the method for mapping characters on trees and for identifying character correlation.  相似文献   

18.
In computational evolutionary biology, verification and benchmarking is a challenging task because the evolutionary history of studied biological entities is usually not known. Computer programs for simulating sequence evolution in silico have shown to be viable test beds for the verification of newly developed methods and to compare different algorithms. However, current simulation packages tend to focus either on gene-level aspects of genome evolution such as character substitutions and insertions and deletions (indels) or on genome-level aspects such as genome rearrangement and speciation events. Here, we introduce Artificial Life Framework (ALF), which aims at simulating the entire range of evolutionary forces that act on genomes: nucleotide, codon, or amino acid substitution (under simple or mixture models), indels, GC-content amelioration, gene duplication, gene loss, gene fusion, gene fission, genome rearrangement, lateral gene transfer (LGT), or speciation. The other distinctive feature of ALF is its user-friendly yet powerful web interface. We illustrate the utility of ALF with two possible applications: 1) we reanalyze data from a study of selection after globin gene duplication and test the statistical significance of the original conclusions and 2) we demonstrate that LGT can dramatically decrease the accuracy of two well-established orthology inference methods. ALF is available as a stand-alone application or via a web interface at http://www.cbrg.ethz.ch/alf.  相似文献   

19.
This article generalizes previous models for codon substitution and rate variation in molecular phylogeny. Particular attention is paid to (1) reversibility, (2) acceptance and rejection of proposed codon changes, (3) varying rates of evolution among codon sites, and (4) the interaction of these sites in determining evolutionary rates. To accommodate spatial variation in rates, Markov random fields rather than Markov chains are introduced. Because these innovations complicate maximum likelihood estimation in phylogeny reconstruction, it is necessary to formulate new algorithms for the evaluation of the likelihood and its derivatives with respect to the underlying kinetic, acceptance, and spatial parameters. To derive the most from maximum likelihood analysis of sequence data, it is useful to compute posterior probabilities assigning residues to internal nodes and evolutionary rate classes to codon sites. It is also helpful to search through tree space in a way that respects accepted phylogenetic relationships. Our phylogeny program LINNAEUS implements algorithms realizing these goals. Readers may consult our companion article in this issue for several examples.  相似文献   

20.

Background

Coalescent simulations have proven very useful in many population genetics studies. In order to arrive to meaningful conclusions, it is important that these simulations resemble the process of molecular evolution as much as possible. To date, no single coalescent program is able to simulate codon sequences sampled from populations with recombination, migration and growth.

Results

We introduce a new coalescent program, called Recodon, which is able to simulate samples of coding DNA sequences under complex scenarios in which several evolutionary forces can interact simultaneously (namely, recombination, migration and demography). The basic codon model implemented is an extension to the general time-reversible model of nucleotide substitution with a proportion of invariable sites and among-site rate variation. In addition, the program implements non-reversible processes and mixtures of different codon models.

Conclusion

Recodon is a flexible tool for the simulation of coding DNA sequences under realistic evolutionary models. These simulations can be used to build parameter distributions for testing evolutionary hypotheses using experimental data. Recodon is written in C, can run in parallel, and is freely available from http://darwin.uvigo.es/.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号