首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
The prevailing wisdom of the plant mitochondrial genome is that it has very low substitution rates, thus it is generally assumed that nucleotide diversity within species will also be low. However, recent evidence suggests plant mitochondrial genes may harbor variable and sometimes high levels of within-species polymorphism, a result attributed to variance in the influence of selection. However, insufficient attention has been paid to the effect of among-gene variation in mutation rate on varying levels of polymorphism across loci. We measured levels of polymorphism in seven mitochondrial gene regions across a geographically wide sample of the plant Silene vulgaris to investigate whether individual mitochondrial genes accumulate polymorphisms equally. We found that genes vary significantly in polymorphism. Tests based on coalescence theory show that the genes vary significantly in their scaled mutation rate, which, in the absence of differences among genes in effective population size, suggests these genes vary in their underlying mutation rate. Further evidence that among-gene variance in polymorphism is due to variation in the underlying mutation rate comes from a significant positive relationship between the number of segregating sites and silent site divergence from an outgroup. Contrary to recent studies, we found unconvincing evidence of recombination in the mitochondrial genome, and generally confirm the standard model of plant mitochondria characterized by low substitution rates and no recombination. We also show no evidence of significant variation in the strength or direction of selection among genes; this result may be expected if there is no recombination. The present study provides some of the most thorough data on plant mitochondrial polymorphism, and provides compelling evidence for mutation rate variation among genes. The study also demonstrates the difficulty in establishing a null model of mitochondrial genome polymorphism, and thus the difficulty, in the absence of a comparative approach, in testing the assumption that low substitution rates in plant mitochondria lead to low polymorphism.  相似文献   

2.
A formula is obtained for the probability that two genes at a single locus, sampled at random from a population at time t, are of particular types. The model assumed is a diffusion approximation to a neutral Wright-Fisher model in which mutation is not necessarily symmetric and the population size is a function of time. It is shown that for symmetric mutation in a population undergoing a step-function type bottleneck, homozygosity increases with decreasing population size. A formula is given for the distribution of the number of segregating sites occurring in two randomly sampled sequences of completely linked sites, with general mutation at a site and identical mutation structure between sites.We give similar results for a population of fixed size but for which the mutation rate is a function of time, and not necessarily symmetric. We confirm the intuitively clear effect that increasing the mutation rate decreases homozygosity.  相似文献   

3.
Since plant mitochondrial genomes exhibit some of the slowest known synonymous substitution rates, it is generally believed that they experience exceptionally low mutation rates. However, the use of synonymous substitution rates to infer mutation rates depends on the implicit assumption that synonymous sites are evolving neutrally (or nearly so). To assess the validity of this assumption in plant mitochondrial genomes, we examined coding sequence for footprints of selection acting at synonymous sites. We found that synonymous sites exhibit an AT rich and pyrimidine skewed nucleotide composition compared to both non-synonymous sites and non-coding regions. We also found some evidence for selection associated with both biased codon usage and conservation of regulatory sequences involved in mRNA processing, although some of these findings are subject to alternative non-adaptive interpretations. Regardless, the inferred strength of selection appears too weak to account for the variation in substitution rates between the mitochondrial genomes of plants and other multicellular eukaryotes. Therefore, these results are consistent with the interpretation that plant mitochondrial genomes experience a substantially lower mutation rate rather than increased functional constraints acting on synonymous sites. Nevertheless, there are important nucleotide composition patterns (particularly the differences between synonymous sites and non-coding DNA) that remain largely unexplained.  相似文献   

4.
The mitochondrial DNA hypervariable segment I (HVS-I) is widely used in studies of human evolutionary genetics, and therefore accurate estimates of mutation rates among nucleotide sites in this region are essential. We have developed a novel maximum-likelihood methodology for estimating site-specific mutation rates from partial phylogenetic information, such as haplogroup association. The resulting estimation problem is a generalized linear model, with a nonstandard link function. We develop inference and bias correction tools for our estimates and a hypothesis-testing approach for site independence. We demonstrate our methodology using 16,609 HVS-I samples from the Genographic Project. Our results suggest that mutation rates among nucleotide sites in HVS-I are highly variable. The 16,400–16,500 region exhibits significantly lower rates compared to other regions, suggesting potential functional constraints. Several loci identified in the literature as possible termination-associated sequences (TAS) do not yield statistically slower rates than the rest of HVS-I, casting doubt on their functional importance. Our tests do not reject the null hypothesis of independent mutation rates among nucleotide sites, supporting the use of site-independence assumption for analyzing HVS-I. Potential extensions of our methodology include its application to estimation of mutation rates in other genetic regions, like Y chromosome short tandem repeats.  相似文献   

5.
Estimating Substitution Rates in Ribosomal RNA Genes   总被引:7,自引:0,他引:7       下载免费PDF全文
A. Rzhetsky 《Genetics》1995,141(2):771-783
A model is introduced describing nucleotide substitution in ribosomal RNA (rRNA) genes. In this model, substitution in the stem and loop regions of rRNA is modeled with 16- and four-state continuous time Markov chains, respectively. The mean substitution rates at nucleotide sites are assumed to follow gamma distributions that are different for the two types of regions. The simplest formulation of the model allows for explicit expressions for transition probabilities of the Markov processes to be found. These expressions were used to analyze several 16S-like rRNA genes from higher eukaryotes with the maximum likelihood method. Although the observed proportion of invariable sites was only slightly higher in the stem regions, the estimated average substitution rates in the stem regions were almost two times as high as in the loop regions. Therefore, the degree of site heterogeneity of substitution rates in the stem regions seems to be higher than in the loop regions of animal 16S-like rRNAs due to presence of a few rapidly evolving sites. The model appears to be helpful in understanding the regularities of nucleotide substitution in rRNAs and probably minimizing errors in recovering phylogeny for distantly related taxa from these genes.  相似文献   

6.
Keightley PD  Eyre-Walker A 《Genetics》2007,177(4):2251-2261
The distribution of fitness effects of new mutations (DFE) is important for addressing several questions in genetics, including the nature of quantitative variation and the evolutionary fate of small populations. Properties of the DFE can be inferred by comparing the distributions of the frequencies of segregating nucleotide polymorphisms at selected and neutral sites in a population sample, but demographic changes alter the spectrum of allele frequencies at both neutral and selected sites, so can bias estimates of the DFE if not accounted for. We have developed a maximum-likelihood approach, based on the expected allele-frequency distribution generated by transition matrix methods, to estimate parameters of the DFE while simultaneously estimating parameters of a demographic model that allows a population size change at some time in the past. We tested the method using simulations and found that it accurately recovers simulated parameter values, even if the simulated demography differs substantially from that assumed in our analysis. We use our method to estimate parameters of the DFE for amino acid-changing mutations in humans and Drosophila melanogaster. For a model of unconditionally deleterious mutations, with effects sampled from a gamma distribution, the mean estimate for the distribution shape parameter is approximately 0.2 for human populations, which implies that the DFE is strongly leptokurtic. For Drosophila populations, we estimate that the shape parameter is approximately 0.35. Differences in the shape of the distribution and the mean selection coefficient between humans and Drosophila result in significantly more strongly deleterious mutations in Drosophila than in humans, and, conversely, nearly neutral mutations are significantly less frequent.  相似文献   

7.
Estimation of Levels of Gene Flow from DNA Sequence Data   总被引:52,自引:0,他引:52       下载免费PDF全文
R. R. Hudson  M. Slatkin    W. P. Maddison 《Genetics》1992,132(2):583-589
We compare the utility of two methods for estimating the average levels of gene flow from DNA sequence data. One method is based on estimating FST from frequencies at polymorphic sites, treating each site as a separate locus. The other method is based on computing the minimum number of migration events consistent with the gene tree inferred from their sequences. We compared the performance of these two methods on data that were generated by a computer simulation program that assumed the infinite sites model of mutation and that assumed an island model of migration. We found that in general when there is no recombination, the cladistic method performed better than FST while the reverse was true for rates of recombination similar to those found in eukaryotic nuclear genes, although FST performed better for all recombination rates for very low levels of migration (Nm = 0.1).  相似文献   

8.
K. Misawa  F. Tajima 《Genetics》1997,147(4):1959-1964
Knowing the amount of DNA polymorphism is essential to understand the mechanism of maintaining DNA polymorphism in a natural population. The amount of DNA polymorphism can be measured by the average number of nucleotide differences per site (π), the proportion of segregating (polymorphic) site (s) and the minimum number of mutations per site (s*). Since the latter two quantities depend on the sample size, θ is often used as a measure of the amount of DNA polymorphism, where θ = 4Nμ, N is the effective population size and μ is the neutral mutation rate per site per generation. It is known that θ estimated from π, s and s* under the infinite site model can be biased when the mutation rate varies among sites. We have therefore developed new methods for estimating θ under the finite site model. Using computer simulations, it has been shown that the new methods give almost unbiased estimates even when the mutation rate varies among sites substantially. Furthermore, we have also developed new statistics for testing neutrality by modifying Tajima's D statistic. Computer simulations suggest that the new test statistics can be used even when the mutation rate varies among sites.  相似文献   

9.
In order to study the effect of mutation rate heterogeneity on patterns of DNA polymorphism, we simulated samples of DNA sequences with gamma- distributed nucleotide substitution rates in stationary and expanding populations. We find that recent population expansions and mutation rate heterogeneity have similar effects on several polymorphism indicators, like the shape and the mean of the observed pairwise difference distribution, or the number of segregating sites. The inferred size of population expansion thus appears overestimated if nucleotides have dissimilar substitution rates. Interestingly, population expansion and uneven mutation rates have contrasting effects on Tajima's D statistic when acting separately, and the consequence on the associated test of selective neutrality is investigated. The patterns of polymorphism of several human populations analyzed for the mitochondrial control region are examined, mainly showing the difficulty in quantifying the respective contribution of past demographic history and uneven mutation rates from a single sampled evolutionary process. However, substitution rates appear more heterogeneous in the second hypervariable segment of the control region than in the first segment.   相似文献   

10.
F. Tajima 《Genetics》1996,143(3):1457-1465
The expectations of the average number of nucleotide differences per site (π), the proportion of segregating site (s), the minimum number of mutations per site (s*) and some other quantities were derived under the finite site models with and without rate variation among sites, where the finite site models include Jukes and Cantor's model, the equal-input model and Kimura's model. As a model of rate variation, the gamma distribution was used. The results indicate that if distribution parameter α is small, the effect of rate variation on these quantities are substantial, so that the estimates of θ based on the infinite site model are substantially underestimated, where θ = 4Nv, N is the effective population size and v is the mutation rate per site per generation. New methods for estimating θ are also presented, which are based on the finite site models with and without rate variation. Using these methods, underestimation can be corrected.  相似文献   

11.
We would like to use maximum likelihood to estimate parameters such as the effective population size N(e) or, if we do not know mutation rates, the product 4N(e) mu of mutation rate per site and effective population size. To compute the likelihood for a sample of unrecombined nucleotide sequences taken from a random-mating population it is necessary to sum over all genealogies that could have led to the sequences, computing for each one the probability that it would have yielded the sequences, and weighting each one by its prior probability. The genealogies vary in tree topology and in branch lengths. Although the likelihood and the prior are straightforward to compute, the summation over all genealogies seems at first sight hopelessly difficult. This paper reports that it is possible to carry out a Monte Carlo integration to evaluate the likelihoods approximately. The method uses bootstrap sampling of sites to create data sets for each of which a maximum likelihood tree is estimated. The resulting trees are assumed to be sampled from a distribution whose height is proportional to the likelihood surface for the full data. That it will be so is dependent on a theorem which is not proven, but seems likely to be true if the sequences are not short. One can use the resulting estimated likelihood curve to make a maximum likelihood estimate of the parameter of interest, N(e) or of 4N(e) mu. The method requires at least 100 times the computational effort required for estimation of a phylogeny by maximum likelihood, but is practical on today's work stations. The method does not at present have any way of dealing with recombination.  相似文献   

12.
Despite the advances in understanding molecular evolution, current phylogenetic methods barely take account of a fraction of the complexity of evolution. We are chiefly constrained by our incomplete knowledge of molecular evolutionary processes and the limits of computational power. These limitations lead to the establishment of either biologically simplistic models that rarely account for a fraction of the complexity involved or overfitting models that add little resolution to the problem. Such oversimplified models may lead us to assign high confidence to an incorrect tree (inconsistency). Rate-across-site (RAS) models are commonly used evolutionary models in phylogenetic studies. These account for heterogeneity in the evolutionary rates among sites but do not account for changing within-site rates across lineages (heterotachy). If heterotachy is common, using RAS models may lead to systematic errors in tree inference. In this work we show possible misleading effects in tree inference when the assumption of constant within-site rates across lineages is violated using maximum likelihood. Using a simulation study, we explore the ways in which gamma stationary models can lead to wrong topology or to deceptive bootstrap support values when the within-site rates change across lineages. More precisely, we show that different degrees of heterotachy mislead phylogenetic inference when the model assumed is stationary. Finally, we propose a geometry-based approach to visualize and to test for the possible existence of bias due to heterotachy.  相似文献   

13.
P Beerli  J Felsenstein 《Genetics》1999,152(2):763-773
A new method for the estimation of migration rates and effective population sizes is described. It uses a maximum-likelihood framework based on coalescence theory. The parameters are estimated by Metropolis-Hastings importance sampling. In a two-population model this method estimates four parameters: the effective population size and the immigration rate for each population relative to the mutation rate. Summarizing over loci can be done by assuming either that the mutation rate is the same for all loci or that the mutation rates are gamma distributed among loci but the same for all sites of a locus. The estimates are as good as or better than those from an optimized FST-based measure. The program is available on the World Wide Web at http://evolution.genetics. washington.edu/lamarc.html/.  相似文献   

14.
Palmer ME  Lipsitch M 《Genetics》2006,173(1):461-472
The question of how natural selection affects asexual mutation rates has been considered since the 1930s, yet our understanding continues to deepen. The distribution of mutation rates observed in natural bacteria remains unexplained. It is well known that environmental constancy can favor minimal mutation rates. In contrast, environmental fluctuation (e.g., at period T) can create indirect selective pressure for stronger mutators: genes modifying mutation rate may "hitchhike" to greater frequency along with environmentally favored mutations they produce. This article extends a well-known model of Leigh to consider fitness genes with multiple mutable sites (call the number of such sites alpha). The phenotypic effect of such a gene is enabled if all sites are in a certain state and disabled otherwise. The effects of multiple deleterious loci are also included (call the number of such loci gamma). The analysis calculates the indirect selective effects experienced by a gene inducing various mutation rates for given values of alpha, gamma, and T. Finite-population simulations validate these results and let us examine the interaction of drift with hitchhiking selection. We close by commenting on the importance of other factors, such as spatiotemporal variation, and on the origin of variation in mutation rates.  相似文献   

15.
Summary Selective constraints on DNA sequence change were incorporated into a model of DNA divergence by restricting substitutions to a subset of nucleotide positions. A simple model showed that both mutation rate and the fraction of nucleotide positions free to vary are strong determinants of DNA divergence over time.When divergence between two species approaches the fraction of positions free to vary, standard methods that correct for multiple mutations yield severe underestimates of the number of substitutions per site. A modified method appropriate for use with DNA sequence, restriction site, or thermal renaturation data is derived taking this fraction into account. The model also showed that the ratio of divergence in two gene classes (e.g., nuclear and mitochondrial) may vary widely over time even if the ratio of mutation rates remains constant.DNA sequence divergence data are used increasingly to detect differences in rates of molecular evolution. Often, variation in divergence rate is assumed to represent variation in mutation rate. The present model suggests that differing divergence rates among comparisons (either among gene classes or taxa) should be interpreted cautiously. Differences in the fraction of nucleotide positions free to vary can serve as an important alternative hypothesis to explain differences in DNA divergence rates.  相似文献   

16.
For over 3 decades, the rate of replacement mutations has been assumed to be equal to, and estimated from, the rate of "strictly" neutral sequence divergence in noncoding regions and in silent-codon positions where mutations do not alter the amino acid encoded. This assumption is fundamental to estimating the fraction of harmful protein mutations and to identifying adaptive evolution at individual codons and proteins. We show that the assumption is not justifiable because a much larger fraction of codon positions is involved in hypermutable CpG dinucleotides as compared with the introns, leading to a higher expected replacement mutation rate per site in a vast majority of the genes. Consideration of this difference reveals a higher intensity of purifying natural selection than previously inferred in human genes. We also show that a much smaller number of genes are expected to be evolving with positive selection than that predicted using sequence divergence at intron and silent positions in the human genome. These patterns indicate the need for using new approaches for estimating rates of amino acid-altering mutations in order to find positively selected genes and codons in genomes that contain hypermutable CpG's.  相似文献   

17.
It is becoming routine to obtain data sets on DNA sequence variation across several thousands of chromosomes, providing unprecedented opportunity to infer the underlying biological and demographic forces. Such data make it vital to study summary statistics that offer enough compression to be tractable, while preserving a great deal of information. One well-studied summary is the site frequency spectrum—the empirical distribution, across segregating sites, of the sample frequency of the derived allele. However, most previous theoretical work has assumed that each site has experienced at most one mutation event in its genealogical history, which becomes less tenable for very large sample sizes. In this work we obtain, in closed form, the predicted frequency spectrum of a site that has experienced at most two mutation events, under very general assumptions about the distribution of branch lengths in the underlying coalescent tree. Among other applications, we obtain the frequency spectrum of a triallelic site in a model of historically varying population size. We demonstrate the utility of our formulas in two settings: First, we show that triallelic sites are more sensitive to the parameters of a population that has experienced historical growth, suggesting that they will have use if they can be incorporated into demographic inference. Second, we investigate a recently proposed alternative mechanism of mutation in which the two derived alleles of a triallelic site are created simultaneously within a single individual, and we develop a test to determine whether it is responsible for the excess of triallelic sites in the human genome.  相似文献   

18.
Over the last decade, surveys of DNA sequence variation in natural populations of several Drosophila species and other taxa have established that polymorphism is reduced in genomic regions characterized by low rates of crossing over per physical length. Parallel studies have also established that divergence between species is not reduced in these same genomic regions, thus eliminating explanations that rely on a correlation between the rates of mutation and crossing over. Several theoretical models (directional hitchhiking, background selection, and random environment) have been proposed as population genetic explanations. In this study samples from an African population (n = 50) and a European population (n = 51) were surveyed at the su(s) (1955 bp) and su(w(a)) (3213 bp) loci for DNA sequence polymorphism, utilizing a stratified SSCP/DNA sequencing protocol. These loci are located near the telomere of the X chromosome, in a region of reduced crossing over per physical length, and exhibit a significant reduction in DNA sequence polymorphism. Unlike most previously surveyed, these loci reveal substantial skews toward rare site frequencies, consistent with the predictions of directional hitchhiking and random environment models and inconsistent with the general predictions of the background selection model (or neutral theory). No evidence for excess geographic differentiation at these loci is observed. Although linkage disequilibrium is observed between closely linked sites within these loci, many recombination events in the genealogy of the sampled alleles can be inferred and the genomic scale of linkage disequilibrium, measured in base pairs between sites, is the same as that observed for loci in regions of normal crossing over. We conclude that gene conversion must be high in these regions of low crossing over.  相似文献   

19.
20.
Fay MP  Tiwari RC  Feuer EJ  Zou Z 《Biometrics》2006,62(3):847-854
The annual percent change (APC) is often used to measure trends in disease and mortality rates, and a common estimator of this parameter uses a linear model on the log of the age-standardized rates. Under the assumption of linearity on the log scale, which is equivalent to a constant change assumption, APC can be equivalently defined in three ways as transformations of either (1) the slope of the line that runs through the log of each rate, (2) the ratio of the last rate to the first rate in the series, or (3) the geometric mean of the proportional changes in the rates over the series. When the constant change assumption fails then the first definition cannot be applied as is, while the second and third definitions unambiguously define the same parameter regardless of whether the assumption holds. We call this parameter the percent change annualized (PCA) and propose two new estimators of it. The first, the two-point estimator, uses only the first and last rates, assuming nothing about the rates in between. This estimator requires fewer assumptions and is asymptotically unbiased as the size of the population gets large, but has more variability since it uses no information from the middle rates. The second estimator is an adaptive one and equals the linear model estimator with a high probability when the rates are not significantly different from linear on the log scale, but includes fewer points if there are significant departures from that linearity. For the two-point estimator we can use confidence intervals previously developed for ratios of directly standardized rates. For the adaptive estimator, we show through simulation that the bootstrap confidence intervals give appropriate coverage.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号