首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
J S Lopes  M Arenas  D Posada  M A Beaumont 《Heredity》2014,112(3):255-264
The estimation of parameters in molecular evolution may be biased when some processes are not considered. For example, the estimation of selection at the molecular level using codon-substitution models can have an upward bias when recombination is ignored. Here we address the joint estimation of recombination, molecular adaptation and substitution rates from coding sequences using approximate Bayesian computation (ABC). We describe the implementation of a regression-based strategy for choosing subsets of summary statistics for coding data, and show that this approach can accurately infer recombination allowing for intracodon recombination breakpoints, molecular adaptation and codon substitution rates. We demonstrate that our ABC approach can outperform other analytical methods under a variety of evolutionary scenarios. We also show that although the choice of the codon-substitution model is important, our inferences are robust to a moderate degree of model misspecification. In addition, we demonstrate that our approach can accurately choose the evolutionary model that best fits the data, providing an alternative for when the use of full-likelihood methods is impracticable. Finally, we applied our ABC method to co-estimate recombination, substitution and molecular adaptation rates from 24 published human immunodeficiency virus 1 coding data sets.  相似文献   

2.
3.

Background  

It has long been known that rates of synonymous substitutions are unusually low in mitochondrial genes of flowering and other land plants. Although two dramatic exceptions to this pattern have recently been reported, it is unclear how often major increases in substitution rates occur during plant mitochondrial evolution and what the overall magnitude of substitution rate variation is across plants.  相似文献   

4.
Mitochondrial genomes encode fundamental subunits of the basic energy producing machinery of eukaryotic cells that are under strong functional constraint. Paradoxically, these genes evolve rapidly in general, and there is substantial variation in evolutionary rates among genes within genomes. In order to investigate spatial variation in selection intensity, we conducted tests of neutrality using ratios of synonymous to nonsynonymous substitutions (dN/dS = omega) on numerous protein gene segments from fishes and mammals. Values of omega were very low for nearly all genomic regions. However, values of both omega and dN varied in a clinal pattern with increasing distance from the light-strand origin of replication. Spatial heterogeneity of nonsynonymous substitution rates exhibits a significantly positive correlation with variation in mutation rates that are related to the mode of mitochondrial DNA replication. The finding that nonsynonymous substitution rates are proportional to mutation rates is expected if a majority of substitutions are selectively neutral or slightly deleterious. Spatial patterns of among-gene variation in nonsynonymous rates were highly similar between fishes and mammals, suggesting that forces governing mitochondrial gene evolution have remained relatively constant over 450 Myr of vertebrate evolution. Conservation of substitution patterns despite major shifts in thermal habit and metabolic demands among taxa implicates a conserved replication mechanism controlling relative mutation rates as a major determinant of mitochondrial protein evolution.  相似文献   

5.
We investigated the variability in amino acid sequences between mitochondrial cytochrome c oxidase subunit II (COII) domains, as well as that of gene sequences encoding the corresponding domains. According to the secondary structure, COII consisted of five domains of N- and C-terminal regions posited in the intermembrane space, two transmembrane helices (TM1 and TM2) in the lipid bilayer, and a matrix-embedded loop (ML) that intervened between the two helices. Our analysis, using dictyopteran insects as model species, revealed that amino acid and nucleotide substitution rates were heterogeneous between the COII domains. The amino acid substitution rates were higher in the TM1 (0.380 ± 0.123) and ML domains (0.416 ± 0.184), whereas they were relatively lower in the N-terminal (0.204 ± 0.123) and TM2 domains (0.184 ± 0.088). As expected by the variability in the amino acid substitution rates, the average nucleotide substitution rates were also relatively higher in the TM1 (0.312 ± 0.081) and ML domains (0.302 ± 0.093), whereas the lowest substitution rate was observed in the N domain (0.191 ± 0.073). These results indicate that the heterogeneous substitution rates between COII domains, as well as genes encoding the domains, might be closely related to the inner membrane environment where each region of the amino acid sequence is laid.  相似文献   

6.
Rodrigue N  Lartillot N  Philippe H 《Genetics》2008,180(3):1579-1591
In 1994, Muse and Gaut (MG) and Goldman and Yang (GY) proposed evolutionary models that recognize the coding structure of the nucleotide sequences under study, by defining a Markovian substitution process with a state space consisting of the 61 sense codons (assuming the universal genetic code). Several variations and extensions to their models have since been proposed, but no general and flexible framework for contrasting the relative performance of alternative approaches has yet been applied. Here, we compute Bayes factors to evaluate the relative merit of several MG and GY styles of codon substitution models, including recent extensions acknowledging heterogeneous nonsynonymous rates across sites, as well as selective effects inducing uneven amino acid or codon preferences. Our results on three real data sets support a logical model construction following the MG formulation, allowing for a flexible account of global amino acid or codon preferences, while maintaining distinct parameters governing overall nucleotide propensities. Through posterior predictive checks, we highlight the importance of such a parameterization. Altogether, the framework presented here suggests a broad modeling project in the MG style, stressing the importance of combining and contrasting available model formulations and grounding developments in a sound probabilistic paradigm.  相似文献   

7.
A maximum likelihood framework for estimating site-specific substitution rates is presented that does not require any prior assumptions about the rate distribution. We show that, when the branching pattern of the underlying tree is known, the analysis of pairs of positions is sufficient to estimate site-specific rates. In the abscense of a known topology, we introduce an iterative procedure to estimate simultaneously the branching pattern, the branch lengths, and site-specific substitution rates. Simulations show that the evolutionary rate of fast-evolving sites can be reliably inferred and that the accuracy of rate estimates depends mainly on the number of sequences in the data set. Thus, large sets of aligned sequences are necessary for reliable site-specific rate estimates. The method is applied to the complete mitochondrial DNA sequence of 53 humans, providing a complete picture of the site-specific substitution rates in human mitochondrial DNA.  相似文献   

8.
A new method for calculating evolutionary substitution rates   总被引:39,自引:0,他引:39  
Summary In this paper we present a new method for analysing molecular evolution in homologous genes based on a general stationary Markov process. The elaborate statistical analysis necessary to apply the method effectively has been performed using Monte Carlo technqiues. We have applied our method to the silent third position of the codon of the five mitochondrial genes coding for identified proteins of four mammalian species (rat, mouse, cow and man). We found that the method applies satisfactorily to the three former species, while the last appears to be outside the scope of the present approach. The method allows one to calculate the evolutionarily effective silent substitution rate (vs) for mitochondrial genes, which in the species mentioned above is 1.4×10–8 nucleotide substitutions per site per year. We have also determined the divergence time ratios between the couples mousecow/rat-mouse and rat-cow/rat-mouse. In both cases this value is approximately 1.4.  相似文献   

9.
A maximum likelihood method for independently estimating the relative rate of substitution at different nucleotide sites is presented. With this method, the evolution of DNA sequences can be analyzed without assuming a specific distribution of rates among sites. To investigate the pattern of correlation of rates among sites, the method was applied to a data set consisting of the protein-coding regions of the mitochondrial genome from 10 vertebrate species. Rates appear to be strongly correlated at distances up to 40 codons apart. Furthermore, there appears to be some higher order correlation of sites approximately 75 codons apart. The method of site-by-site estimation of the rate of substitution may also be applied to examine other aspects of rate variation along a DNA sequence and to assess the difference in the support of a tree along the sequence.  相似文献   

10.
Models of amino acid substitution present challenges beyond those often faced with the analysis of DNA sequences. The alignments of amino acid sequences are often small, whereas the number of parameters to be estimated is potentially large when compared with the number of free parameters for nucleotide substitution models. Most approaches to the analysis of amino acid alignments have focused on the use of fixed amino acid models in which all of the potentially free parameters are fixed to values estimated from a large number of sequences. Often, these fixed amino acid models are specific to a gene or taxonomic group (e.g. the Mtmam model, which has parameters that are specific to mammalian mitochondrial gene sequences). Although the fixed amino acid models succeed in reducing the number of free parameters to be estimated--indeed, they reduce the number of free parameters from approximately 200 to 0--it is possible that none of the currently available fixed amino acid models is appropriate for a specific alignment. Here, we present four approaches to the analysis of amino acid sequences. First, we explore the use of a general time reversible model of amino acid substitution using a Dirichlet prior probability distribution on the 190 exchangeability parameters. Second, we then explore the behaviour of prior probability distributions that are'centred' on the rates specified by the fixed amino acid model. Third, we consider a mixture of fixed amino acid models. Finally, we consider constraints on the exchangeability parameters as partitions,similar to how nucleotide substitution models are specified, and place a Dirichlet process prior model on all the possible partitioning schemes.  相似文献   

11.
Genetic sequence data typically exhibit variability in substitution rates across sites. In practice, there is often too little variation to fit a different rate for each site in the alignment, but the distribution of rates across sites may not be well modeled using simple parametric families. Mixtures of different distributions can capture more complex patterns of rate variation, but are often parameter-rich and difficult to fit. We present a simple hierarchical model in which a baseline rate distribution, such as a gamma distribution, is discretized into several categories, the quantiles of which are estimated using a discretized beta distribution. Although this approach involves adding only two extra parameters to a standard distribution, a wide range of rate distributions can be captured. Using simulated data, we demonstrate that a "beta-" model can reproduce the moments of the rate distribution more accurately than the distribution used to simulate the data, even when the baseline rate distribution is misspecified. Using hepatitis C virus and mammalian mitochondrial sequences, we show that a beta- model can fit as well or better than a model with multiple discrete rate categories, and compares favorably with a model which fits a separate rate category to each site. We also demonstrate this discretization scheme in the context of codon models specifically aimed at identifying individual sites undergoing adaptive or purifying evolution.  相似文献   

12.
We develop a new model for studying the molecular evolution of protein-coding DNA sequences. In contrast to existing models, we incorporate the potential for site-to-site heterogeneity of both synonymous and nonsynonymous substitution rates. We demonstrate that within-gene heterogeneity of synonymous substitution rates appears to be common. Using the new family of models, we investigate the utility of a variety of new statistical inference procedures, and we pay particular attention to issues surrounding the detection of sites undergoing positive selection. We discuss how failure to model synonymous rate variation in the model can lead to misidentification of sites as positively selected.  相似文献   

13.
The mitochondrial genome is one of the most frequently used loci in phylogenetic and phylogeographic analyses, and it is becoming increasingly possible to sequence and analyze this genome in its entirety from diverse taxa. However, sequencing the entire genome is not always desirable or feasible. Which genes should be selected to best infer the evolutionary history of the mitochondria within a group of organisms, and what properties of a gene determine its phylogenetic performance? The current study addresses these questions in a Bayesian phylogenetic framework with reference to a phylogeny of plethodontid and related salamanders derived from 27 complete mitochondrial genomes; this topology is corroborated by nuclear DNA and morphological data. Evolutionary rates for each mitochondrial gene and divergence dates for all nodes in the plethodontid mitochondrial genome phylogeny were estimated in both Bayesian and maximum likelihood frameworks using multiple fossil calibrations, multiple data partitions, and a clock-independent approach. Bayesian analyses of individual genes were performed, and the resulting trees compared against the reference topology. Ordinal logistic regression analysis of molecular evolution rate, gene length, and the G-shape parameter a demonstrated that slower rate of evolution and longer gene length both increased the probability that a gene would perform well phylogenetically. Estimated rates of molecular evolution vary 84-fold among different mitochondrial genes and different salamander lineages, and mean rates among genes vary 15-fold. Despite having conserved amino acid sequences, cox1, cox2, cox3, and cob have the fastest mean rates of nucleotide substitution, and the greatest variation in rates, whereas rrnS and rrnL have the slowest rates. Reasons underlying this rate variation are discussed, as is the extensive rate variation in cox1 in light of its proposed role in DNA barcoding.  相似文献   

14.
We studied mutations in the mtDNA control region (CR) using deep-rooting French-Canadian pedigrees. In 508 maternal transmissions, we observed four substitutions (0.0079 per generation per 673 bp, 95% CI 0.0023-0.186). Combined with other familial studies, our results add up to 18 substitutions in 1,729 transmissions (0.0104), confirming earlier findings of much greater mutation rates in families than those based on phylogenetic comparisons. Only 12 of these mutations occurred at independent sites, whereas three positions mutated twice each, suggesting that pedigree studies preferentially reveal a fraction of highly mutable sites. Fitting the data through use of a nonuniform rate model predicts the presence of 40 (95% CI 27-54) such fast sites in the whole CR, characterized by the mutation rate of 274 per site per million generations (95% CI 138-410). The corresponding values for hypervariable regions I (HVI; 1,729 transmissions) and II (HVII; 1,956 transmissions), are 19 and 22 fast sites, with rates of 224 and 274, respectively. Because of the high probability of recurrent mutations, such sites are expected to be of no or little informativity for the evaluation of mutational distances at the phylogenetic time scale. The analysis of substitution density in the alignment of 973 HVI and 650 HVII unrelated European sequences reveals that the bulk of the sites mutate at relatively moderate and slow rates. Assuming a star-like phylogeny and an average time depth of 250 generations, we estimate the rates for HVI and HVII at 23 and 24 for the moderate sites and 1.3 and 1.0 for the slow sites. The fast, moderate, and slow sites, at the ratio of 1:2:13, respectively, describe the mutation-rate heterogeneity in the CR. Our results reconcile the controversial rate estimates in the phylogenetic and familial studies; the fast sites prevail in the latter, whereas the slow and moderate sites dominate the phylogenetic-rate estimations.  相似文献   

15.
The spectrum of single-base-pair substitutions logged in The Human Gene Mutation Database (HGMD), comprising 7,271 different lesions in the coding regions of 547 different human genes, was analyzed for nearest-neighbor effects on relative mutation rates. Owing to its retrospective nature, HGMD allows mutation rates to be estimated only in relative terms. Therefore, a novel methodology was devised in order to obtain these estimates in iterative fashion, correcting, at the same time, for the confounding effects of differential codon usage and for the fact that different types of amino acid replacement come to clinical attention with different probabilities. Over and above the hypermutability of CpG dinucleotides, reflected in transition rates five times the base mutation rate, only a subtle and locally confined influence of the surrounding DNA sequence on relative single-base-pair substitution rates was observed, which extended no farther than 2 bp from the substitution site. A disparity between the two DNA strands was evidenced by the fact that, when substitution rates were estimated conditional on the 5' and 3' flanking nucleotides, a significant rate difference emerged for 10 of 96 possible pairs of complementary substitutional events. Mutational bias, favoring substitutions toward flanking bases, a phenomenon reminiscent of misalignment mutagenesis, was apparent and exhibited both directionality and reading-frame sensitivity. No specific preponderance of repeat-sequence motifs was observed in the vicinity of nucleotide substitutions, but a moderate correlation between the relative mutability and thermodynamic stability of DNA triplets emerged, suggesting either inefficient DNA replication in regions of high stability or the transient stabilization of misaligned intermediates.  相似文献   

16.
A quantitative map of nucleotide substitution rates in bacterial rRNA.   总被引:11,自引:3,他引:11       下载免费PDF全文
A recently developed method for estimating the variability of nucleotide sites in a sequence alignment [Van de Peer, Y., Van der Auwera, G. and De Wachter, R. (1996) J. Mol. Evol. 42, 201-210] was applied to bacterial 16S, 5S and 23S rRNAs. In this method, the variability of each nucleotide site is defined as its evolutionary rate relative to the average evolutionary rate of all the nucleotide sites of the molecule. Spectra of evolutionary rates were calculated for each rRNA and show the fastest evolving sites substituting at rates more than 1000 times that of the slowest ones. Variability maps are presented for each rRNA, consisting of secondary structure models where the variability of each nucleotide site is indicated by means of a colored dot. The maps can be interpreted in terms of higher order structure, function and evolution of the molecules and facilitate the selection of areas suitable for the design of PCR primers and hybridization probes. Variability measurement is also important for the precise estimation of evolutionary distances and the inference of phylogenetic trees.  相似文献   

17.
Estimating synonymous and nonsynonymous substitution rates   总被引:4,自引:4,他引:4  
Partitioning the total substitution rate into synnonymous and nonsynonymous components is a key aspect of many analyses in molecular evolution. Numerous methods exist for estimating these rates. However, until recently none of the estimation procedures were based on a sound statistical footing. In this paper, the evolutionary model of Muse and Gaut (1994) is used as the basis for two sets of parameters quantifying silent and replacement substitution rates. The parameters are shown to be equal when the four nucleotides are equally frequent and unequal otherwise. Maximum-likelihood estimation of these parameters is described, and the performance of these estimates is compared to that of existing estimation procedures. It is shown that the estimates of Nei and Gojobori (1986) are not unbiased for either set of parameters, although they provide very good estimates for one set as long as sequence divergence is not too high. However, some disturbing properties are found for the Nei and Gojobori estimates. In particular, it is shown that the expected value of the Nei and Gojobori estimate of silent substitution rate is a function of both the silent and replacement substitution rates. The maximum-likelihood estimates have no such problems.   相似文献   

18.
19.
An estimate of the average number of evolutionarily acceptable substitutions per nucleotide since the most recent common ancestor of a pair of homologous sequences is found which uses nucleotide sequence data. The estimate is derived assuming a Poisson-like model for the evolutionary process. A method is also suggested for analyzing nucleotide sequence data in M homologous sequences (M 3). A simulation study is reported showing that the estimates are satisfactory providing there is sufficient homology between the sequences. To demonstrate the methods a numerical example using some β-globin data is presented.  相似文献   

20.
Nonhomogeneous substitution models have been introduced for phylogenetic inference when the substitution process is nonstationary, for example, when sequence composition differs between lineages. Existing models can have many parameters, and it is then difficult and computationally expensive to learn the parameters and to select the optimal model complexity. We extend an existing nonhomogeneous substitution model by introducing a reversible jump Markov chain Monte Carlo method for efficient Bayesian inference of the model order along with other phylogenetic parameters of interest. We also introduce a new hierarchical prior which leads to more reasonable results when only a small number of lineages share a particular substitution process. The method is implemented in the PHASE software, which includes specialized substitution models for RNA genes with conserved secondary structure. We apply an RNA-specific nonhomogeneous model to a structure-based alignment of rRNA sequences spanning the entire tree of life. A previous study of the same genes from a similar set of species found robust evidence for a mesophilic last universal common ancestor (LUCA) by inference of the G+C composition at the root of the tree. In the present study, we find that the helical GC composition at the root is strongly dependent on the root position. With a bacterial rooting, we find that there is no longer strong support for either a mesophile or a thermophile LUCA, although a hyperthermophile LUCA remains unlikely. We discuss reasons why results using only RNA helices may differ from results using all aligned sites when applying nonhomogeneous models to RNA genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号