首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 0 毫秒
J S Lopes  M Arenas  D Posada  M A Beaumont 《Heredity》2014,112(3):255-264
The estimation of parameters in molecular evolution may be biased when some processes are not considered. For example, the estimation of selection at the molecular level using codon-substitution models can have an upward bias when recombination is ignored. Here we address the joint estimation of recombination, molecular adaptation and substitution rates from coding sequences using approximate Bayesian computation (ABC). We describe the implementation of a regression-based strategy for choosing subsets of summary statistics for coding data, and show that this approach can accurately infer recombination allowing for intracodon recombination breakpoints, molecular adaptation and codon substitution rates. We demonstrate that our ABC approach can outperform other analytical methods under a variety of evolutionary scenarios. We also show that although the choice of the codon-substitution model is important, our inferences are robust to a moderate degree of model misspecification. In addition, we demonstrate that our approach can accurately choose the evolutionary model that best fits the data, providing an alternative for when the use of full-likelihood methods is impracticable. Finally, we applied our ABC method to co-estimate recombination, substitution and molecular adaptation rates from 24 published human immunodeficiency virus 1 coding data sets.  相似文献   

Mitochondrial genomes encode fundamental subunits of the basic energy producing machinery of eukaryotic cells that are under strong functional constraint. Paradoxically, these genes evolve rapidly in general, and there is substantial variation in evolutionary rates among genes within genomes. In order to investigate spatial variation in selection intensity, we conducted tests of neutrality using ratios of synonymous to nonsynonymous substitutions (dN/dS = omega) on numerous protein gene segments from fishes and mammals. Values of omega were very low for nearly all genomic regions. However, values of both omega and dN varied in a clinal pattern with increasing distance from the light-strand origin of replication. Spatial heterogeneity of nonsynonymous substitution rates exhibits a significantly positive correlation with variation in mutation rates that are related to the mode of mitochondrial DNA replication. The finding that nonsynonymous substitution rates are proportional to mutation rates is expected if a majority of substitutions are selectively neutral or slightly deleterious. Spatial patterns of among-gene variation in nonsynonymous rates were highly similar between fishes and mammals, suggesting that forces governing mitochondrial gene evolution have remained relatively constant over 450 Myr of vertebrate evolution. Conservation of substitution patterns despite major shifts in thermal habit and metabolic demands among taxa implicates a conserved replication mechanism controlling relative mutation rates as a major determinant of mitochondrial protein evolution.  相似文献   



It has long been known that rates of synonymous substitutions are unusually low in mitochondrial genes of flowering and other land plants. Although two dramatic exceptions to this pattern have recently been reported, it is unclear how often major increases in substitution rates occur during plant mitochondrial evolution and what the overall magnitude of substitution rate variation is across plants.  相似文献   

We investigated the variability in amino acid sequences between mitochondrial cytochrome c oxidase subunit II (COII) domains, as well as that of gene sequences encoding the corresponding domains. According to the secondary structure, COII consisted of five domains of N- and C-terminal regions posited in the intermembrane space, two transmembrane helices (TM1 and TM2) in the lipid bilayer, and a matrix-embedded loop (ML) that intervened between the two helices. Our analysis, using dictyopteran insects as model species, revealed that amino acid and nucleotide substitution rates were heterogeneous between the COII domains. The amino acid substitution rates were higher in the TM1 (0.380 ± 0.123) and ML domains (0.416 ± 0.184), whereas they were relatively lower in the N-terminal (0.204 ± 0.123) and TM2 domains (0.184 ± 0.088). As expected by the variability in the amino acid substitution rates, the average nucleotide substitution rates were also relatively higher in the TM1 (0.312 ± 0.081) and ML domains (0.302 ± 0.093), whereas the lowest substitution rate was observed in the N domain (0.191 ± 0.073). These results indicate that the heterogeneous substitution rates between COII domains, as well as genes encoding the domains, might be closely related to the inner membrane environment where each region of the amino acid sequence is laid.  相似文献   

A maximum likelihood method for independently estimating the relative rate of substitution at different nucleotide sites is presented. With this method, the evolution of DNA sequences can be analyzed without assuming a specific distribution of rates among sites. To investigate the pattern of correlation of rates among sites, the method was applied to a data set consisting of the protein-coding regions of the mitochondrial genome from 10 vertebrate species. Rates appear to be strongly correlated at distances up to 40 codons apart. Furthermore, there appears to be some higher order correlation of sites approximately 75 codons apart. The method of site-by-site estimation of the rate of substitution may also be applied to examine other aspects of rate variation along a DNA sequence and to assess the difference in the support of a tree along the sequence.  相似文献   



Rates of synonymous nucleotide substitutions are, in general, exceptionally low in plant mitochondrial genomes, several times lower than in chloroplast genomes, 10–20 times lower than in plant nuclear genomes, and 50–100 times lower than in many animal mitochondrial genomes. Several cases of moderate variation in mitochondrial substitution rates have been reported in plants, but these mostly involve correlated changes in chloroplast and/or nuclear substitution rates and are therefore thought to reflect whole-organism forces rather than ones impinging directly on the mitochondrial mutation rate. Only a single case of extensive, mitochondrial-specific rate changes has been described, in the angiosperm genus Plantago.  相似文献   

A maximum likelihood framework for estimating site-specific substitution rates is presented that does not require any prior assumptions about the rate distribution. We show that, when the branching pattern of the underlying tree is known, the analysis of pairs of positions is sufficient to estimate site-specific rates. In the abscense of a known topology, we introduce an iterative procedure to estimate simultaneously the branching pattern, the branch lengths, and site-specific substitution rates. Simulations show that the evolutionary rate of fast-evolving sites can be reliably inferred and that the accuracy of rate estimates depends mainly on the number of sequences in the data set. Thus, large sets of aligned sequences are necessary for reliable site-specific rate estimates. The method is applied to the complete mitochondrial DNA sequence of 53 humans, providing a complete picture of the site-specific substitution rates in human mitochondrial DNA.  相似文献   

A new method for calculating evolutionary substitution rates   总被引:39,自引:0,他引:39  
Summary In this paper we present a new method for analysing molecular evolution in homologous genes based on a general stationary Markov process. The elaborate statistical analysis necessary to apply the method effectively has been performed using Monte Carlo technqiues. We have applied our method to the silent third position of the codon of the five mitochondrial genes coding for identified proteins of four mammalian species (rat, mouse, cow and man). We found that the method applies satisfactorily to the three former species, while the last appears to be outside the scope of the present approach. The method allows one to calculate the evolutionarily effective silent substitution rate (vs) for mitochondrial genes, which in the species mentioned above is 1.4×10–8 nucleotide substitutions per site per year. We have also determined the divergence time ratios between the couples mousecow/rat-mouse and rat-cow/rat-mouse. In both cases this value is approximately 1.4.  相似文献   

We used mitochondrial DNA data to infer phylogenies for 28 samples of gall-inducing Tamalia aphids from 12 host-plant species, and for 17 samples of Tamalia inquilinus, aphid 'inquilines' that obligately inhabit galls of the gall inducers and do not form their own galls. Our phylogenetic analyses indicate that the inquilines are monophyletic and closely related to their host aphids. Tamalia coweni aphids from different host plants were, with one exception, very closely related to one another. By contrast, the T. inquilinus aphids were strongly genetically differentiated among most of their host plants. Comparison of branch lengths between the T. coweni clade and the T. inquilinus clade indicates that the T. inquilinus lineage evolves 2.5-3 times faster for the cytochrome oxidase I gene. These results demonstrate that: (1) Tamalia inquilines originated from their gall-inducing hosts, (2) communal (multi-female) gall induction apparently facilitated the origin of inquilinism, (3) diversification of the inquilines has involved rapid speciation along host-plant lines, or the rapid evolution of host-plant races, and (4) the inquilines have undergone accelerated molecular evolution relative to their hosts, probably due to reduced effective population sizes. Our findings provide insight into the behavioural causes and evolutionary consequences of transitions from resource generation to resource exploitation.  相似文献   

Rodrigue N  Lartillot N  Philippe H 《Genetics》2008,180(3):1579-1591
In 1994, Muse and Gaut (MG) and Goldman and Yang (GY) proposed evolutionary models that recognize the coding structure of the nucleotide sequences under study, by defining a Markovian substitution process with a state space consisting of the 61 sense codons (assuming the universal genetic code). Several variations and extensions to their models have since been proposed, but no general and flexible framework for contrasting the relative performance of alternative approaches has yet been applied. Here, we compute Bayes factors to evaluate the relative merit of several MG and GY styles of codon substitution models, including recent extensions acknowledging heterogeneous nonsynonymous rates across sites, as well as selective effects inducing uneven amino acid or codon preferences. Our results on three real data sets support a logical model construction following the MG formulation, allowing for a flexible account of global amino acid or codon preferences, while maintaining distinct parameters governing overall nucleotide propensities. Through posterior predictive checks, we highlight the importance of such a parameterization. Altogether, the framework presented here suggests a broad modeling project in the MG style, stressing the importance of combining and contrasting available model formulations and grounding developments in a sound probabilistic paradigm.  相似文献   

We used Bayesian phylogenetic analysis of 5 kb of chloroplast DNA data from 68 Sapotaceae species to clarify phylogenetic relationships within Sapotoideae, one of the two major clades within Sapotaceae. Variation in substitution rates through time was shown to be a very important aspect of molecular evolution for this data set. Relative rates tests indicated that changes in overall rate have taken place in several lineages during the history of the group and Bayes factors strongly supported a covarion model, which allows the rate of a site to vary over time, over commonly used models that only allow rates to vary across sites. Rate variation over time was actually found to be a more important model component than rate variation across sites. The covarion model was originally developed for coding gene sequences and has so far only been tested for this type of data. The fact that it performed so well with the present data set, consisting mainly of data from noncoding spacer regions, suggests that it deserves a wider consideration in model based phylogenetic inference. Repeatability of phylogenetic results was very difficult to obtain with the more parameter rich models, and analyses with identical settings often supported different topologies. Overparameterization may be the reason why the MCMC did not sample from the posterior distribution in these cases. The problem could, however, be overcome by using less parameter rich evolutionary models, and adjusting the MCMC settings. The phylogenetic results showed that two taxa, previously thought to belong in Sapotoideae, are not part of this group. Eberhardtia aurata is the sister of the two major Sapotaceae clades, Chrysophylloideae and Sapotoideae, and Neohemsleya usambarensis belongs in Chrysophylloideae. Within Sapotoideae two clades, Sideroxyleae and Sapoteae, were strongly supported. Bayesian analysis of the character history of some floral morphological traits showed that the ancestral type of flower in Sapotoideae may have been characterized by floral parts (sepals, petals, stamens, and staminodes) in single whorls of five, entire corolla lobes, and seeds with an adaxial hilum.  相似文献   

We introduce a new model for relaxing the assumption of a strict molecular clock for use as a prior in Bayesian methods for divergence time estimation. Lineage-specific rates of substitution are modeled using a Dirichlet process prior (DPP), a type of stochastic process that assumes lineages of a phylogenetic tree are distributed into distinct rate classes. Under the Dirichlet process, the number of rate classes, assignment of branches to rate classes, and the rate value associated with each class are treated as random variables. The performance of this model was evaluated by conducting analyses on data sets simulated under a range of different models. We compared the Dirichlet process model with two alternative models for rate variation: the strict molecular clock and the independent rates model. Our results show that divergence time estimation under the DPP provides robust estimates of node ages and branch rates without significantly reducing power. Further analyses were conducted on a biological data set, and we provide examples of ways to summarize Markov chain Monte Carlo samples under this model.  相似文献   

Models of amino acid substitution present challenges beyond those often faced with the analysis of DNA sequences. The alignments of amino acid sequences are often small, whereas the number of parameters to be estimated is potentially large when compared with the number of free parameters for nucleotide substitution models. Most approaches to the analysis of amino acid alignments have focused on the use of fixed amino acid models in which all of the potentially free parameters are fixed to values estimated from a large number of sequences. Often, these fixed amino acid models are specific to a gene or taxonomic group (e.g. the Mtmam model, which has parameters that are specific to mammalian mitochondrial gene sequences). Although the fixed amino acid models succeed in reducing the number of free parameters to be estimated--indeed, they reduce the number of free parameters from approximately 200 to 0--it is possible that none of the currently available fixed amino acid models is appropriate for a specific alignment. Here, we present four approaches to the analysis of amino acid sequences. First, we explore the use of a general time reversible model of amino acid substitution using a Dirichlet prior probability distribution on the 190 exchangeability parameters. Second, we then explore the behaviour of prior probability distributions that are'centred' on the rates specified by the fixed amino acid model. Third, we consider a mixture of fixed amino acid models. Finally, we consider constraints on the exchangeability parameters as partitions,similar to how nucleotide substitution models are specified, and place a Dirichlet process prior model on all the possible partitioning schemes.  相似文献   

Genetic sequence data typically exhibit variability in substitution rates across sites. In practice, there is often too little variation to fit a different rate for each site in the alignment, but the distribution of rates across sites may not be well modeled using simple parametric families. Mixtures of different distributions can capture more complex patterns of rate variation, but are often parameter-rich and difficult to fit. We present a simple hierarchical model in which a baseline rate distribution, such as a gamma distribution, is discretized into several categories, the quantiles of which are estimated using a discretized beta distribution. Although this approach involves adding only two extra parameters to a standard distribution, a wide range of rate distributions can be captured. Using simulated data, we demonstrate that a "beta-" model can reproduce the moments of the rate distribution more accurately than the distribution used to simulate the data, even when the baseline rate distribution is misspecified. Using hepatitis C virus and mammalian mitochondrial sequences, we show that a beta- model can fit as well or better than a model with multiple discrete rate categories, and compares favorably with a model which fits a separate rate category to each site. We also demonstrate this discretization scheme in the context of codon models specifically aimed at identifying individual sites undergoing adaptive or purifying evolution.  相似文献   

We develop a new model for studying the molecular evolution of protein-coding DNA sequences. In contrast to existing models, we incorporate the potential for site-to-site heterogeneity of both synonymous and nonsynonymous substitution rates. We demonstrate that within-gene heterogeneity of synonymous substitution rates appears to be common. Using the new family of models, we investigate the utility of a variety of new statistical inference procedures, and we pay particular attention to issues surrounding the detection of sites undergoing positive selection. We discuss how failure to model synonymous rate variation in the model can lead to misidentification of sites as positively selected.  相似文献   

We studied mutations in the mtDNA control region (CR) using deep-rooting French-Canadian pedigrees. In 508 maternal transmissions, we observed four substitutions (0.0079 per generation per 673 bp, 95% CI 0.0023-0.186). Combined with other familial studies, our results add up to 18 substitutions in 1,729 transmissions (0.0104), confirming earlier findings of much greater mutation rates in families than those based on phylogenetic comparisons. Only 12 of these mutations occurred at independent sites, whereas three positions mutated twice each, suggesting that pedigree studies preferentially reveal a fraction of highly mutable sites. Fitting the data through use of a nonuniform rate model predicts the presence of 40 (95% CI 27-54) such fast sites in the whole CR, characterized by the mutation rate of 274 per site per million generations (95% CI 138-410). The corresponding values for hypervariable regions I (HVI; 1,729 transmissions) and II (HVII; 1,956 transmissions), are 19 and 22 fast sites, with rates of 224 and 274, respectively. Because of the high probability of recurrent mutations, such sites are expected to be of no or little informativity for the evaluation of mutational distances at the phylogenetic time scale. The analysis of substitution density in the alignment of 973 HVI and 650 HVII unrelated European sequences reveals that the bulk of the sites mutate at relatively moderate and slow rates. Assuming a star-like phylogeny and an average time depth of 250 generations, we estimate the rates for HVI and HVII at 23 and 24 for the moderate sites and 1.3 and 1.0 for the slow sites. The fast, moderate, and slow sites, at the ratio of 1:2:13, respectively, describe the mutation-rate heterogeneity in the CR. Our results reconcile the controversial rate estimates in the phylogenetic and familial studies; the fast sites prevail in the latter, whereas the slow and moderate sites dominate the phylogenetic-rate estimations.  相似文献   

Past population size can be estimated from modern genetic diversity using coalescent theory. Estimates of ancestral human population dynamics in sub-Saharan Africa can tell us about the timing and nature of our first steps towards colonizing the globe. Here, we combine Bayesian coalescent inference with a dataset of 224 complete human mitochondrial DNA (mtDNA) sequences to estimate effective population size through time for each of the four major African mtDNA haplogroups (L0-L3). We find evidence of three distinct demographic histories underlying the four haplogroups. Haplogroups L0 and L1 both show slow, steady exponential growth from 156 to 213kyr ago. By contrast, haplogroups L2 and L3 show evidence of substantial growth beginning 12-20 and 61-86kyr ago, respectively. These later expansions may be associated with contemporaneous environmental and/or cultural changes. The timing of the L3 expansion--8-12kyr prior to the emergence of the first non-African mtDNA lineages--together with high L3 diversity in eastern Africa, strongly supports the proposal that the human exodus from Africa and subsequent colonization of the globe was prefaced by a major expansion within Africa, perhaps driven by some form of cultural innovation.  相似文献   

The mitochondrial genome is one of the most frequently used loci in phylogenetic and phylogeographic analyses, and it is becoming increasingly possible to sequence and analyze this genome in its entirety from diverse taxa. However, sequencing the entire genome is not always desirable or feasible. Which genes should be selected to best infer the evolutionary history of the mitochondria within a group of organisms, and what properties of a gene determine its phylogenetic performance? The current study addresses these questions in a Bayesian phylogenetic framework with reference to a phylogeny of plethodontid and related salamanders derived from 27 complete mitochondrial genomes; this topology is corroborated by nuclear DNA and morphological data. Evolutionary rates for each mitochondrial gene and divergence dates for all nodes in the plethodontid mitochondrial genome phylogeny were estimated in both Bayesian and maximum likelihood frameworks using multiple fossil calibrations, multiple data partitions, and a clock-independent approach. Bayesian analyses of individual genes were performed, and the resulting trees compared against the reference topology. Ordinal logistic regression analysis of molecular evolution rate, gene length, and the G-shape parameter a demonstrated that slower rate of evolution and longer gene length both increased the probability that a gene would perform well phylogenetically. Estimated rates of molecular evolution vary 84-fold among different mitochondrial genes and different salamander lineages, and mean rates among genes vary 15-fold. Despite having conserved amino acid sequences, cox1, cox2, cox3, and cob have the fastest mean rates of nucleotide substitution, and the greatest variation in rates, whereas rrnS and rrnL have the slowest rates. Reasons underlying this rate variation are discussed, as is the extensive rate variation in cox1 in light of its proposed role in DNA barcoding.  相似文献   

The spectrum of single-base-pair substitutions logged in The Human Gene Mutation Database (HGMD), comprising 7,271 different lesions in the coding regions of 547 different human genes, was analyzed for nearest-neighbor effects on relative mutation rates. Owing to its retrospective nature, HGMD allows mutation rates to be estimated only in relative terms. Therefore, a novel methodology was devised in order to obtain these estimates in iterative fashion, correcting, at the same time, for the confounding effects of differential codon usage and for the fact that different types of amino acid replacement come to clinical attention with different probabilities. Over and above the hypermutability of CpG dinucleotides, reflected in transition rates five times the base mutation rate, only a subtle and locally confined influence of the surrounding DNA sequence on relative single-base-pair substitution rates was observed, which extended no farther than 2 bp from the substitution site. A disparity between the two DNA strands was evidenced by the fact that, when substitution rates were estimated conditional on the 5' and 3' flanking nucleotides, a significant rate difference emerged for 10 of 96 possible pairs of complementary substitutional events. Mutational bias, favoring substitutions toward flanking bases, a phenomenon reminiscent of misalignment mutagenesis, was apparent and exhibited both directionality and reading-frame sensitivity. No specific preponderance of repeat-sequence motifs was observed in the vicinity of nucleotide substitutions, but a moderate correlation between the relative mutability and thermodynamic stability of DNA triplets emerged, suggesting either inefficient DNA replication in regions of high stability or the transient stabilization of misaligned intermediates.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号