首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
A positive correlation between ω, the ratio of the nonsynonymous and synonymous substitution rates, and dS, the synonymous substitution rate has recently been reported. This correlation is unexpected under simple evolutionary models. Here, we investigate two explanations for this correlation: first, whether it is a consequence of a statistical bias in the estimation of ω and second, whether it is due to substitutions at adjacent sites. Using simulations, we show that estimates of ω are biased when levels of divergence are low. This is true using the methods of Yang and Nielsen, Nei and Gojobori, and Muse and Gaut. Although the bias could generate a positive correlation between ω and dS, we show that it is unlikely to be the main determinant. Instead we show that the correlation is reduced when genes that are high quality in sequence, annotation, and alignment are used. The remaining--likely genuine--positive correlation appears to be due to adjacent tandem substitutions; single substitutions, though far more numerous, do not contribute to the correlation. Genuine adjacent substitutions may be due to mutation or selection.  相似文献   

Comparisons of replacement to silent divergence have been used in a variety of studies aimed at detecting selection. Here, such comparisons are shown to be very sensitive to the pattern of rate variation in replacement sites. Saturation may play an important role even at surprisingly low levels of divergence if the substitution rate varies across replacement sites. For example, saturation in replacement sites may be of importance in the evolution of the HIV-1 envelope gene. However, the pattern of saturation in replacement and silent sites may, in itself, provide valuable insight into the causes of DNA evolution. 210 DNA sequences from 15 different loci/systematic groups are analyzed, and evidence for positive selection is demonstrated in at least one of these data sets, through an analysis of the distribution of substitution rates along the sequence.  相似文献   

Codon models of evolution have facilitated the interpretation of selective forces operating on genomes. These models, however, assume a single rate of non-synonymous substitution irrespective of the nature of amino acids being exchanged. Recent developments have shown that models which allow for amino acid pairs to have independent rates of substitution offer improved fit over single rate models. However, these approaches have been limited by the necessity for large alignments in their estimation. An alternative approach is to assume that substitution rates between amino acid pairs can be subdivided into rate classes, dependent on the information content of the alignment. However, given the combinatorially large number of such models, an efficient model search strategy is needed. Here we develop a Genetic Algorithm (GA) method for the estimation of such models. A GA is used to assign amino acid substitution pairs to a series of rate classes, where is estimated from the alignment. Other parameters of the phylogenetic Markov model, including substitution rates, character frequencies and branch lengths are estimated using standard maximum likelihood optimization procedures. We apply the GA to empirical alignments and show improved model fit over existing models of codon evolution. Our results suggest that current models are poor approximations of protein evolution and thus gene and organism specific multi-rate models that incorporate amino acid substitution biases are preferred. We further anticipate that the clustering of amino acid substitution rates into classes will be biologically informative, such that genes with similar functions exhibit similar clustering, and hence this clustering will be useful for the evolutionary fingerprinting of genes.  相似文献   

A common approach to estimate the strength and direction of selection acting on protein coding sequences is to calculate the dN/dS ratio. The method to calculate dN/dS has been widely used by many researchers and many critical reviews have been made on its application after the proposition by Nei and Gojobori in 1986. However, the method is still evolving considering the non-uniform substitution rates and pretermination codons. In our study of SNPs in 586 genes across 156 Escherichia coli strains, synonymous polymorphism in 2-fold degenerate codons were higher in comparison to that in 4-fold degenerate codons, which could be attributed to the difference between transition (Ti) and transversion (Tv) substitution rates where the average rate of a transition is four times more than that of a transversion in general. We considered both the Ti/Tv ratio, and nonsense mutation in pretermination codons, to improve estimates of synonymous (S) and non-synonymous (NS) sites. The accuracy of estimating dN/dS has been improved by considering the Ti/Tv ratio and nonsense substitutions in pretermination codons. We showed that applying the modified approach based on Ti/Tv ratio and pretermination codons results in higher values of dN/dS in 29 common genes of equal reading-frames between E. coli and Salmonella enterica. This study emphasizes the robustness of amino acid composition with varying codon degeneracy, as well as the pretermination codons when calculating dN/dS values.  相似文献   

Sequence divergence derives from either point substitution or indel (insertion or deletion) processes. We investigated the rates of these two processes both in protein and non-protein coding DNA. We aligned sequence pairs using two pair-hidden Markov models (PHMMs) conjoined by one silent state. The two PHMMs had their own set of parameters to model rates in their respective regions. The aim was to test the hypothesis that the indel mutation rate mimics the point mutation rate. That is, indels are found less often in conserved regions (slow point substitution rate) and more often in non-conserved regions (fast point substitution rate). Both polypeptides and rRNA molecules in our data exhibited a clear distinction between slow and fast rates of the two processes. These two rates served as surrogates to conserved and non-conserved secondary structure components, respectively. With polypeptides we found both the fast indel rate and the fast replacement rate were co-located with hydrophilic residues. We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues. With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate. Nevertheless, coupling the indel rates with the point substitution rates across the two regions markedly increased model fit. This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions.  相似文献   

We present a new likelihood method for detecting constrained evolution at synonymous sites and other forms of nonneutral evolution in putative pseudogenes. The model is applicable whenever the DNA sequence is available from a protein-coding functional gene, a pseudogene derived from the protein-coding gene, and an orthologous functional copy of the gene. Two nested likelihood ratio tests are developed to test the hypotheses that (1) the putative pseudogene has equal rates of silent and replacement substitutions; and (2) the rate of synonymous substitution in the functional gene equals the rate of substitution in the pseudogene. The method is applied to a data set containing 74 human processed-pseudogene loci, 25 mouse processed-pseudogene loci, and 22 rat processed-pseudogene loci. Using the informatics resources of the Human Genome Project, we localized 67 of the human-pseudogene pairs in the genome and estimated the GC content of a large surrounding genomic region for each. We find that, for pseudogenes deposited in GC regions similar to those of their paralogs, the assumption of equal rates of silent and replacement site evolution in the pseudogene is upheld; in these cases, the rate of silent site evolution in the functional genes is approximately 70% the rate of evolution in the pseudogene. On the other hand, for pseudogenes located in genomic regions of much lower GC than their functional gene, we see a sharp increase in the rate of silent site substitutions, leading to a large rate of rejection for the pseudogene equality likelihood ratio test.  相似文献   

When the number of nucleotides examined is relatively small, the estimators of nucleotide substitutions between DNA sequences often introduce systematic error even if the data used fit the mathematical model underlying the estimation formula. The systematic error of this kind is especially large for models that allow variation in substitution rate among different sites. In the present paper we present a number of formulas that produce virtually bias-free estimates of evolutionary distances for these models. Correspondence to: M. Nei  相似文献   

Mitochondrial D-loop hypervariable region I (HVI) sequences are widely used in human molecular evolutionary studies, and therefore accurate assessment of rate heterogeneity among sites is essential. We used the maximum-likelihood method to estimate the gamma shape parameter alpha for variable substitution rates among sites for HVI from humans and chimpanzees to provide estimates for future studies. The complete data of 839 humans and 224 chimpanzees, as well as many subsets of these data, were analyzed to examine the effect of sequence sampling. The effects of the genealogical tree and the nucleotide substitution model were also examined. The transition/transversion rate ratio (kappa) is estimated to be about 25, although much larger and biased estimates were also obtained from small data sets at low divergences. Estimates of alpha were 0.28-0.39 for human data sets of different sizes and 0.20-0.39 for data sets including different chimpanzee subspecies. The combined data set of both species gave estimates of 0.42-0.45. While all those estimates suggest highly variable substitution rates among sites, smaller samples tend to give smaller estimates of alpha. Possible causes for this pattern were examined, such as biases in the estimation procedure and shifts in the rate distribution along certain lineages. Computer simulations suggest that the estimation procedure is quite reliable for large trees but can be biased for small samples at low divergences. Thus, an alpha of 0.4 appears suitable for both humans and chimpanzees. Estimates of alpha can be affected by the nucleotide sites included in the data, the overall tree length (the amount of sequence divergence), the number of rate classes used for the estimation, and to a lesser extent, the included sequences. The genealogical tree, the substitution model, and demographic processes such as population expansion do not have much effect.  相似文献   

Summary This paper constructs a temporal scale for bacterial evolution by tying ecological events that took place at known times in the geological past to specific branch points in the genealogical tree relating the 16S ribosomal RNAs of eubacteria, mitochondria, and chloroplasts. One thus obtains a relationship between time and bacterial RNA divergence which can be used to estimate times of divergence between other branches in the bacterial tree. According to this approach,Salmonella typhimurium andEscherichia coli diverged between 120 and 160 million years (Myr) ago, a date which fits with evidence that the chief habitats occupied now by these two enteric species became available that long ago.The median extent of divergence betweenS. typhimurium andE. coli at synonymous sites for 21 kilobases of protein-coding DNA is 100%. This implies a silent substitution rate of 0.7–0.8%/Myr—a rate remarkably similar to that observed in the nuclear genes of mammals, invertebrates, and flowering plants. Similarities in the substitution rates of eucaryotes and procaryotes are not limited to silent substitutions in protein-coding regions. The average substitution rate for 16S rRNA in eubacteria is about 1%/50 Myr, similar to the average rate for 18S rRNA in vertebrates and flowering plants. Likewise, we estimate a mean rate of roughly 1%/25 Myr for 5S rRNA in both eubacteria and eucaryotes.For a few protein-coding genes of these enteric bacteria, the extent of silent substitution since the divergence ofS. typhimurium andE. coli is much lower than 100%, owing to extreme bias in the usage of synonymous codons. Furthermore, in these bacteria, rates of amino acid replacement were about 20 times lower, on average, than the silent rate. By contranst, for the mammalian genes studied to date, the average replacement rate is only four to five times lower than the rate of silent substitution.  相似文献   

Nei and Gojobori (1986) developed a simple method to estimate the numbers of synonymous (ds) and nonsynonymous (dN) substitutions per site. In the present paper, we have developed a method for computing variances and covariances of ds's and dN's and of the proportions of synonymous (ps) and nonsynonymous (pN) differences. We also have developed a method for computing the variances of mean dS, dN, pS, pN, without constructing a phylogenetic tree of the genes. We have conducted computer simulations based on simple evolutionary models and have shown that the new method gives good estimates of variances and covariances.   相似文献   

To examine Gojobori and Nei's hypothesis that the immunoglobulin heavy- chain variable-region (VH) genes in mammals are subject to diversity- enhancing selection, we studied the rates of synonymous and nonsynonymous nucleotide substitution in the complementarity- determining regions (CDRs) and in the framework regions (FRs) of mouse and human VH genes. The results obtained indicate that the non- synonymous rate is higher than the synonymous rate in CDRs, whereas the reverse is true in FRs. This observation supports Gojobori and Nei's hypothesis and suggests that diversity-enhancing selection (similar to overdominant selection) operates mainly in CDRs and that this is one of the evolutionary factors that increase antibody diversity.   相似文献   

Dioecious white campion Silene latifolia has sex chromosomal sex determination, with homogametic (XX) females and heterogametic (XY) males. This species has become popular in studies of sex chromosome evolution. However, the lack of genes isolated from the X and Y chromosomes of this species is a major obstacle for such studies. Here, I report the isolation of a new sex-linked gene, Slss, with strong homology to spermidine synthase genes of other species. The new gene has homologous intact copies on the X and Y chromosomes (SlssX and SlssY, respectively). Synonymous divergence between the SlssX and SlssY genes is 4.7%, and nonsynonymous divergence is 1.4%. Isolation of a homologous gene from nondioecious S. vulgaris provided a root to the gene tree and allowed the estimation of the silent and replacement substitution rates along the SlssX and SlssY lineages. Interestingly, the Y-linked gene has higher synonymous and nonsynonymous substitution rates. The elevated synonymous rate in the SlssY gene, compared with SlssX, confirms our previous suggestion that the S. latifolia Y chromosome has a higher mutation rate, compared with the X chromosome. When differences in silent substitution rate are taken into account, the Y-linked gene still demonstrates significantly faster accumulation of nonsynonymous substitutions, which is consistent with the theoretical prediction of relaxed purifying selection in Y-linked genes, leading to the accumulation of nonsynonymous substitutions and genetic degeneration of the Y-linked genes.  相似文献   

An important issue in the phylogenetic analysis of nucleotide sequence data using the maximum likelihood (ML) method is the underlying evolutionary model employed. We consider the problem of simultaneously estimating the tree topology and the parameters in the underlying substitution model and of obtaining estimates of the standard errors of these parameter estimates. Given a fixed tree topology and corresponding set of branch lengths, the ML estimates of standard evolutionary model parameters are asymptotically efficient, in the sense that their joint distribution is asymptotically normal with the variance–covariance matrix given by the inverse of the Fisher information matrix. We propose a new estimate of this conditional variance based on estimation of the expected information using a Monte Carlo sampling (MCS) method. Simulations are used to compare this conditional variance estimate to the standard technique of using the observed information under a variety of experimental conditions. In the case in which one wishes to estimate simultaneously the tree and parameters, we provide a bootstrapping approach that can be used in conjunction with the MCS method to estimate the unconditional standard error. The methods developed are applied to a real data set consisting of 30 papillomavirus sequences. This overall method is easily incorporated into standard bootstrapping procedures to allow for proper variance estimation.  相似文献   

Various nucleotide substitution models have been developed to accommodate among lineage rate heterogeneity, thereby relaxing the assumptions of the strict molecular clock. Recently developed "uncorrelated relaxed clock" and "random local clock" (RLC) models allow decoupling of nucleotide substitution rates between descendant lineages and are thus predicted to perform better in the presence of lineage-specific rate heterogeneity. However, it is uncertain how these models perform in the presence of punctuated shifts in substitution rate, especially between closely related clades. Using cetaceans (whales and dolphins) as a case study, we test the performance of these two substitution models in estimating both molecular rates and divergence times in the presence of substantial lineage-specific rate heterogeneity. Our RLC analyses of whole mitochondrial genome alignments find evidence for up to ten clade-specific nucleotide substitution rate shifts in cetaceans. We provide evidence that in the uncorrelated relaxed clock framework, a punctuated shift in the rate of molecular evolution within a subclade results in posterior rate estimates that are either misled or intermediate between the disparate rate classes present in baleen and toothed whales. Using simulations, we demonstrate abrupt changes in rate isolated to one or a few lineages in the phylogeny can mislead rate and age estimation, even when the node of interest is calibrated. We further demonstrate how increasing prior age uncertainty can bias rate and age estimates, even while the 95% highest posterior density around age estimates decreases; in other words, increased precision for an inaccurate estimate. We interpret the use of external calibrations in divergence time studies in light of these results, suggesting that rate shifts at deep time scales may mislead inferences of absolute molecular rates and ages.  相似文献   

Rate variation among nuclear genes and the age of polyploidy in Gossypium   总被引:7,自引:0,他引:7  
Molecular evolutionary rate variation in Gossypium (cotton) was characterized using sequence data for 48 nuclear genes from both genomes of allotetraploid cotton, models of its diploid progenitors, and an outgroup. Substitution rates varied widely among the 48 genes, with silent and replacement substitution levels varying from 0.018 to 0.162 and from 0.000 to 0.073, respectively, in comparisons between orthologous Gossypium and outgroup sequences. However, about 90% of the genes had silent substitution rates spanning a more narrow threefold range. Because there was no evidence of rate heterogeneity among lineages for any gene and because rates were highly correlated in independent tests, evolutionary rate is inferred to be a property of each gene or its genetic milieu rather than the clade to which it belongs. Evidence from approximately 200,000 nucleotides (40,000 per genome) suggests that polyploidy in Gossypium led to a modest enhancement in rates of nucleotide substitution. Phylogenetic analysis for each gene yielded the topology expected from organismal history, indicating an absence of gene conversion or recombination among homoeologs subsequent to allopolyploid formation. Using the mean synonymous substitution rate calculated across the 48 genes, allopolyploid cotton is estimated to have formed circa 1.5 million years ago (MYA), after divergence of the diploid progenitors about 6.7 MYA.  相似文献   

Summary Using nine sets of viral and cellular oncogenes, the rates of nucleotide substitutions were computed by using Gojobori and Yokoyama's (1985) method. The results obtained confirmed our previous conclusion that the rates of nucleotide substitution for the viral oncogenes are about a million times higher than those for their cellular counterparts. For cellular oncogenes and most viral oncogenes, however, the rate of synonymous substitution is higher than that of nonsynonymous substitution. Moreover, the pattern of nucleotide substitutions for viral oncogenes is more similar to that for functional genes (such as cellular oncongenes) than for pseudogenes. This implies that nucleotide substitutions in viral oncogenes may be functionally constrained. Thus, our observation supports that nucleotide substitutions for the oncogenes in those DNA and RNA genomes are consistent with Kimura's neutral theory of molecular evolution (Kimura 1968, 1983).  相似文献   

N G Smith  L D Hurst 《Genetics》1999,152(2):661-673
Miyata et al. have suggested that the male-to-female mutation rate ratio (alpha) can be estimated by comparing the neutral substitution rates of X-linked (X), Y-linked (Y), and autosomal (A) genes. Rodent silent site X/A comparisons provide very different estimates from X/Y comparisons. We examine three explanations for this discrepancy: (1) statistical biases and artifacts, (2) nonneutral evolution, and (3) differences in mutation rate per germline replication. By estimating errors and using a variety of methodologies, we tentatively reject explanation 1. Our analyses of patterns of codon usage, synonymous rates, and nonsynonymous rates suggest that silent sites in rodents are evolving neutrally, and we can therefore reject explanation 2. We find both base composition and methylation differences between the different sets of chromosomes, a result consistent with explanation 3, but these differences do not appear to explain the observed discrepancies in estimates of alpha. Our finding of significantly low synonymous substitution rates in genomically imprinted genes suggests a link between hemizygous expression and an adaptive reduction in the mutation rate, which is consistent with explanation 3. Therefore our results provide circumstantial evidence in favor of the hypothesis that the discrepancies in estimates of alpha are due to differences in the mutation rate per germline replication between different parts of the genome. This explanation violates a critical assumption of the method of Miyata et al., and hence we suggest that estimates of alpha, obtained using this method, need to be treated with caution.  相似文献   

Mitochondrial DNA data have been used extensively to study evolution and early human origins. These applications require estimates of the rate at which nucleotide substitutions occur in the DNA sequence. We consider the problem of estimating substitution rates in the presence of site-to-site rate variation. A coalescent model is presented that allows for different substitution rates for purines and pyrimidines, as well as more detailed models that allow fast and slow rates within each of the purine and pyrimidine classes. A method for estimating such rates is presented. Even for these simple models of site heterogeneity, there are, typically, insufficient data to obtain reliable estimates of site-specific substitution rates. However, estimates of the average rate across all sites appear to be relatively stable even in the presence of site heterogeneity. Simulations of models with site-to-site variation in mutation rate show that hypervariable sites can produce peaks in the pairwise difference curves that have previously been attributed to population dynamics.  相似文献   

Variation in substitution rates among evolutionary lineages (among-lineage rate variation or ALRV) has been reported to negatively affect the estimation of phylogenies. When the substitution processes underlying ALRV are modeled inadequately, non-sister taxa with similar substitution rates are estimated incorrectly as sister species due to long-branch attraction. Recent advances in modeling site-specific rate variation (heterotachy) have reduced the impacts of ALRV on phylogeny estimation in several empirical and simulated datasets. However, the addition of parameters to the substitution model reduces power to estimate each parameter correctly, which can also lead to incorrect phylogeny estimation. A potential solution to this problem is to identify the levels of ALRV that negatively impact phylogeny estimation such that molecular markers with non-deleterious levels of ALRV can be identified. To this end, we used analyses of empirical and simulated gene datasets to evaluate whether levels of ALRV identified in a mitochondrial genomic dataset for salamanders negatively impacted phylogeny estimation. We simulated data with and without ALRV, holding all other evolutionary parameters constant, and compared the phylogenetic performance of both simulated and empirical datasets. Overall, we found limited, positive effects of ALRV on phylogeny estimation in this dataset, the majority of which resulted from an increase in substitution rate on short branches. We conclude that ALRV does not always negatively impact phylogeny estimation. Therefore, ALRV can likely be disregarded as a criterion for marker selection in comparable phylogenetic studies.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号