首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The site-frequency spectrum, representing the distribution of allele frequencies at a set of polymorphic sites, is a commonly used summary statistic in population genetics. Explicit forms of the spectrum are known for both models with and without selection if independence among sites is assumed. The availability of these explicit forms has allowed for maximum likelihood estimation of selection, developed first in the Poisson random field model of Sawyer and Hartl, which is now the primary method for estimating selection directly from DNA sequence data. The independence assumption, which amounts to assume free recombination between sites, is, however, a limiting case for many population genetics models. Here, we extend the site-frequency spectrum theory to consider the case where the sites are completely linked. We use diffusion approximation to calculate the joint distribution of the allele frequencies of linked sites for models without selection and for models with equal coefficient selection. The joint distribution is derived by first constructing Green’s functions corresponding to multiallele diffusion equations. We show that the site-frequency spectrum is highly correlated between frequencies that are complementary (i.e., sum to 1), and the correlation is significantly elevated by positive selection. The results presented here can be used to extend the Poisson random field to allow for estimating selection for correlated sites. More generally, the Green’s function construction should be able to aid in studying the genetic drift of multiple alleles in other cases.  相似文献   

2.
Cutter AD 《Genetics》2008,178(3):1661-1672
Natural selection and neutral processes such as demography, mutation, and gene conversion all contribute to patterns of polymorphism within genomes. Identifying the relative importance of these varied components in evolution provides the principal challenge for population genetics. To address this issue in the nematode Caenorhabditis remanei, I sampled nucleotide polymorphism at 40 loci across the X chromosome. The site-frequency spectrum for these loci provides no evidence for population size change, and one locus presents a candidate for linkage to a target of balancing selection. Selection for codon usage bias leads to the non-neutrality of synonymous sites, and despite its weak magnitude of effect (N(e)s approximately 0.1), is responsible for profound patterns of diversity and divergence in the C. remanei genome. Although gene conversion is evident for many loci, biased gene conversion is not identified as a significant evolutionary process in this sample. No consistent association is observed between synonymous-site diversity and linkage-disequilibrium-based estimators of the population recombination parameter, despite theoretical predictions about background selection or widespread genetic hitchhiking, but genetic map-based estimates of recombination are needed to rigorously test for a diversity-recombination relationship. Coalescent simulations also illustrate how a spurious correlation between diversity and linkage-disequilibrium-based estimators of recombination can occur, due in part to the presence of unbiased gene conversion. These results illustrate the influence that subtle natural selection can exert on polymorphism and divergence, in the form of codon usage bias, and demonstrate the potential of C. remanei for detecting natural selection from genomic scans of polymorphism.  相似文献   

3.
Understanding the processes and conditions under which populations diverge to give rise to distinct species is a central question in evolutionary biology. Since recently diverged populations have high levels of shared polymorphisms, it is challenging to distinguish between recent divergence with no (or very low) inter-population gene flow and older splitting events with subsequent gene flow. Recently published methods to infer speciation parameters under the isolation-migration framework are based on summarizing polymorphism data at multiple loci in two species using the joint site-frequency spectrum (JSFS). We have developed two improvements of these methods based on a more extensive use of the JSFS classes of polymorphisms for species with high intra-locus recombination rates. First, using a likelihood based method, we demonstrate that taking into account low-frequency polymorphisms shared between species significantly improves the joint estimation of the divergence time and gene flow between species. Second, we introduce a local linear regression algorithm that considerably reduces the computational time and allows for the estimation of unequal rates of gene flow between species. We also investigate which summary statistics from the JSFS allow the greatest estimation accuracy for divergence time and migration rates for low (around 10) and high (around 100) numbers of loci. Focusing on cases with low numbers of loci and high intra-locus recombination rates we show that our methods for the estimation of divergence time and migration rates are more precise than existing approaches.  相似文献   

4.
Most "tests of neutrality" assess whether particular data sets depart from the predictions of a standard neutral model with no recombination. For Drosophila, where nuclear polymorphism data routinely show evidence of genetic exchange, the assumption of no recombination is often unrealistic. In addition, while conservative, this assumption is made at the cost of a great loss in power. Perhaps as a result, tests of the frequency spectrum based on zero recombination suggest an adequate fit of Drosophila polymorphism data to the predictions of the standard neutral model. Here, we analyze the frequency spectrum of a large number of loci in Drosophila melanogaster and D. simulans using two summary statistics. We use an estimate of the population recombination rate based on a laboratory estimate of the rate of crossing over per physical length and an estimate of the species' effective population size. In contrast to previous studies, we find that roughly half of the loci depart from the predictions of the standard neutral model. The extent of the departure depends on the exact recombination rate, but the global pattern that emerges is robust. Interestingly, these departures from neutral expectations are not unidirectional. The large variance in outcomes may be due to a complex demographic history and inconsistent sampling, or to the pervasive action of natural selection.  相似文献   

5.
Patterns of linkage disequilibrium, homoplasy, and incompatibility are difficult to interpret because they depend on several factors, including the recombination process and the population structure. Here we introduce a novel model-based framework to infer recombination properties from such summary statistics in bacterial genomes. The underlying model is sequentially Markovian so that data can be simulated very efficiently, and we use approximate Bayesian computation techniques to infer parameters. As this does not require us to calculate the likelihood function, the model can be easily extended to investigate less probed aspects of recombination. In particular, we extend our model to account for the bias in the recombination process whereby closely related bacteria recombine more often with one another. We show that this model provides a good fit to a data set of Bacillus cereus genomes and estimate several recombination properties, including the rate of bias in recombination. All the methods described in this article are implemented in a software package that is freely available for download at http://code.google.com/p/clonalorigin/.  相似文献   

6.
Comeron JM  Kreitman M 《Genetics》2000,156(3):1175-1190
Intron length is negatively correlated with recombination in both Drosophila melanogaster and humans. This correlation is not likely to be the result of mutational processes alone: evolutionary analysis of intron length polymorphism in D. melanogaster reveals equivalent ratios of deletion to insertion in regions of high and low recombination. The polymorphism data do reveal, however, an excess of deletions relative to insertions (i.e., a deletion bias), with an overall deletion-to-insertion events ratio of 1.35. We propose two types of selection favoring longer intron lengths. First, the natural mutational bias toward deletion must be opposed by strong selection in very short introns to maintain the minimum intron length needed for the intron splicing reaction. Second, selection will favor insertions in introns that increase recombination between mutations under the influence of selection in adjacent exons. Mutations that increase recombination, even slightly, will be selectively favored because they reduce interference among selected mutations. Interference selection acting on intron length mutations must be very weak, as indicated by frequency spectrum analysis of Drosophila intron length polymorphism, making the equilibrium for intron length sensitive to changes in the recombinational environment and population size. One consequence of this sensitivity is that the advantage of longer introns is expected to decrease inversely with the rate of recombination, thus leading to a negative correlation between intron length and recombination rate. Also in accord with this model, intron length differs between closely related Drosophila species, with the longest variant present more often in D. melanogaster than in D. simulans. We suggest that the study of the proposed dynamic model, taking into account interference among selected sites, might shed light on many aspects of the comparative biology of genome sizes including the C value paradox.  相似文献   

7.
The ability of the site-frequency spectrum (SFS) to reflect the particularities of gene genealogies exhibiting multiple mergers of ancestral lines as opposed to those obtained in the presence of population growth is our focus. An excess of singletons is a well-known characteristic of both population growth and multiple mergers. Other aspects of the SFS, in particular, the weight of the right tail, are, however, affected in specific ways by the two model classes. Using an approximate likelihood method and minimum-distance statistics, our estimates of statistical power indicate that exponential and algebraic growth can indeed be distinguished from multiple-merger coalescents, even for moderate sample sizes, if the number of segregating sites is high enough. A normalized version of the SFS (nSFS) is also used as a summary statistic in an approximate Bayesian computation (ABC) approach. The results give further positive evidence as to the general eligibility of the SFS to distinguish between the different histories.  相似文献   

8.
According to population genetics models, genomic regions with lower crossing-over rates are expected to experience less effective selection because of Hill-Robertson interference (HRi). The effect of genetic linkage is thought to be particularly important for a selection of weak intensity such as selection affecting codon usage. Consistent with this model, codon bias correlates positively with recombination rate in Drosophila melanogaster and Caenorhabditis elegans. However, in these species, the G+C content of both noncoding DNA and synonymous sites correlates positively with recombination, which suggests that mutation patterns and recombination are associated. To remove this effect of mutation patterns on codon bias, we used the synonymous sites of lowly expressed genes that are expected to be effectively neutral sites. We measured the differences between codon biases of highly expressed genes and their lowly expressed neighbors. In D. melanogaster we find that HRi weakly reduces selection on codon usage of genes located in regions of very low recombination; but these genes only comprise 4% of the total. In C. elegans we do not find any evidence for the effect of recombination on selection for codon bias. Computer simulations indicate that HRi poorly enhances codon bias if the local recombination rate is greater than the mutation rate. This prediction of the model is consistent with our data and with the current estimate of the mutation rate in D. melanogaster. The case of C. elegans, which is highly self-fertilizing, is discussed. Our results suggest that HRi is a minor determinant of variations in codon bias across the genome.  相似文献   

9.
We analyze patterns of genetic variability of populations in the presence of a large seedbank with the help of a new coalescent structure called the seedbank coalescent. This ancestral process appears naturally as a scaling limit of the genealogy of large populations that sustain seedbanks, if the seedbank size and individual dormancy times are of the same order as those of the active population. Mutations appear as Poisson processes on the active lineages and potentially at reduced rate also on the dormant lineages. The presence of “dormant” lineages leads to qualitatively altered times to the most recent common ancestor and nonclassical patterns of genetic diversity. To illustrate this we provide a Wright–Fisher model with a seedbank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seedbank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Estimates (obtained by simulations) of the distributions of commonly employed distance statistics, in the presence and absence of a seedbank, are compared. The effect of a seedbank on the expected site-frequency spectrum is also investigated using simulations. Our results indicate that the presence of a large seedbank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect from genetic data the presence of a large seedbank in natural populations.  相似文献   

10.
New statistical tests have been developed in the past decade that enable us to infer evidence of recent strong positive selection from genome-wide data on single-nucleotide polymorphism and to localize the targets of selection in the genome. Based on these tests, past demographic events that led to distortions of the site-frequency spectrum of variation can be distinguished from selection, in particular if linkage disequilibrium is taken into account. These methods have been successfully applied to species from which complete sequence information and polymorphism data are available, including Drosophila melanogaster, humans, and several plant species. To make full use of the available data, however, the tests that were primarily designed for panmictic populations need to be extended to spatially structured populations.  相似文献   

11.
Basic summary statistics that quantify the population genetic structure of influenza virus are important for understanding and inferring the evolutionary and epidemiological processes. However, the sampling dates of global virus sequences in the last several decades are scattered nonuniformly throughout the calendar. Such temporal structure of samples and the small effective size of viral population hampers the use of conventional methods to calculate summary statistics. Here, we define statistics that overcome this problem by correcting for the sampling-time difference in quantifying a pairwise sequence difference. A simple linear regression method jointly estimates the mutation rate and the level of sequence polymorphism, thus providing an estimate of the effective population size. It also leads to the definition of Wright’s FST for arbitrary time-series data. Furthermore, as an alternative to Tajima’s D statistic or the site-frequency spectrum, a mismatch distribution corrected for sampling-time differences can be obtained and compared between actual and simulated data. Application of these methods to seasonal influenza A/H3N2 viruses sampled between 1980 and 2017 and sequences simulated under the model of recurrent positive selection with metapopulation dynamics allowed us to estimate the synonymous mutation rate and find parameter values for selection and demographic structure that fit the observation. We found that the mutation rates of HA and PB1 segments before 2007 were particularly high and that including recurrent positive selection in our model was essential for the genealogical structure of the HA segment. Methods developed here can be generally applied to population genetic inferences using serially sampled genetic data.  相似文献   

12.
Many East Asian human populations harbor a high-frequency deficiency allele for the aldehyde dehydrogenase 2 (ALDH2) enzyme, a critical protein involved in the metabolism of ethanol. Here we use resequencing and long-range SNP haplotype data from a Japanese sample to test whether patterns of nucleotide diversity and linkage disequilibrium at this locus are compatible with a standard neutral model of evolution. Examination of the pattern of polymorphism at a locus such as this, where the frequency of a common allele is known a priori, introduces an ascertainment bias that must be corrected for in analyses of the frequency spectrum of polymorphisms. We apply a flexible and generally applicable simulation approach to correct for this bias in our ALDH2 data and, also, to explore the effect of bias on the commonly used summary statistics Tajima’s D, Fu and Li’s D, and Fay and Wu’s H. Our study finds no evidence that the pattern of genetic variation at ALDH2 differs from that expected under a standard neutral model. However, our general examination of ascertainment bias indicates that a priori knowledge of segregating alleles greatly affects the expected distributions of summary statistics. Under many parameter combinations we find that ascertainment bias introduces an elevated rate of false positives when summary statistics are used to test for deviations from a standard neutral model. However, we also show that over a wide range of conditions the power of all summary statistics can be greatly increased by incorporating prior knowledge of segregating alleles. [Reviewing Editor: Dr. Martin Kreitman]  相似文献   

13.
The domestication of maize (Zea mays ssp. mays) from its wild ancestor (Zea mays ssp. parviglumis) led to a loss of genetic diversity both through a population bottleneck and through directional selection at agronomically important genes. In order to discriminate between those effects and to investigate the nature of the domestication bottleneck, we analyzed nucleotide diversity data from 12 chromosome 1 loci in parviglumis. We found an average loss of nucleotide diversity of 38% across genes, but this average was skewed downward by four putatively selected loci (tb1, d8, ts2, and zagl1). To better understand the domestication process, we used the coalescent with recombination to simulate bottlenecks under various lengths and population sizes. For each locus, we determine the likelihood of the observed data using three summary statistics: the number of segregating sites, an estimate of the population recombination parameter, and Tajima's D. Based on the eight neutrally evolving loci, a model with a bottleneck had a significantly higher likelihood than a model without one. The four putatively selected loci had significantly different likelihood optimums than the neutral loci, and this approach confirmed that ts2 and d8 were selected either during domestication or breeding. Overall, the best-fitting models had a bottleneck in which the population size and the bottleneck duration had a ratio of approximately 4- to approximately 5; for example, if the initial domestication event occurred over a 500-year period, the population size was roughly 2,000 to 2,500 individuals. However, this range did vary with the summary statistic used to assess the fit of simulations to data. In this context, Tajima's D performed poorly as a goodness-of-fit statistic, probably because Z. mays ssp. parviglumis has a frequency spectrum that is significantly skewed toward low-frequency variants. Finally, we found that demography is unlikely to account for the previously observed positive correlation between nucleotide diversity and the population-recombination parameter in maize, leaving this observation difficult to interpret.  相似文献   

14.
Current methods for detecting fluctuating selection require time series data on genotype frequencies. Here, we propose an alternative approach that makes use of DNA polymorphism data from a sample of individuals collected at a single point in time. Our method uses classical diffusion approximations to model temporal fluctuations in the selection coefficients to find the expected distribution of mutation frequencies in the population. Using the Poisson random-field setting we derive the site-frequency spectrum (SFS) for three different models of fluctuating selection. We find that the general effect of fluctuating selection is to produce a more "U"-shaped site-frequency spectrum with an excess of high-frequency derived mutations at the expense of middle-frequency variants. We present likelihood-ratio tests, comparing the fluctuating selection models to the neutral model using SFS data, and use Monte Carlo simulations to assess their power. We find that we have sufficient power to reject a neutral hypothesis using samples on the order of a few hundred SNPs and a sample size of approximately 20 and power to distinguish between selection that varies in time and constant selection for a sample of size 20. We also find that fluctuating selection increases the probability of fixation of selected sites even if, on average, there is no difference in selection among a pair of alleles segregating at the locus. Fluctuating selection will, therefore, lead to an increase in the ratio of divergence to polymorphism similar to that observed under positive directional selection.  相似文献   

15.
Thornton KR  Jensen JD 《Genetics》2007,175(2):737-750
Rapid typing of genetic variation at many regions of the genome is an efficient way to survey variability in natural populations in an effort to identify segments of the genome that have experienced recent natural selection. Following such a genome scan, individual regions may be chosen for further sequencing and a more detailed analysis of patterns of variability, often to perform a parametric test for selection and to estimate the strength of a recent selective sweep. We show here that not accounting for the ascertainment of loci in such analyses leads to false inference of natural selection when the true model is selective neutrality, because the procedure of choosing unusual loci (in comparison to the rest of the genome-scan data) selects regions of the genome with genealogies similar to those expected under models of recent directional selection. We describe a simple and efficient correction for this ascertainment bias, which restores the false-positive rate to near-nominal levels. For the parameters considered here, we find that obtaining a test with the expected distribution of P-values depends on accurately accounting both for ascertainment of regions and for demography. Finally, we use simulations to explore the utility of relying on outlier loci to detect recent selective sweeps. We find that measures of diversity and of population differentiation are more effective than summaries of the site-frequency spectrum and that sequencing larger regions (2.5 kbp) in genome-scan studies leads to more power to detect recent selective sweeps.  相似文献   

16.
The influence of meiotic mutations on the mutation changes in the double super-unstable system in the yellow and scute loci of Drosophila melanogaster was studied. The mei-41D5 and mei-218 mutations changed the spectrum and frequency of mutagenesis in males of the y2nsscme strain, in contrast to the postulate that meiotic mutations do not interfere with male recombination in D. melanogaster. These mutations also changed the frequency and spectrum of mutagenesis in females. In particular, they inhibited mutagenesis at early stages of ovogenesis. Meiotic conversion did not change specifically by mei mutations. At the same time, the mei-41D5 mutation increased all recombination processes in meiosis. The results obtained indicated the involvement of genetic recombination in mutation changes occurring in the double super-unstable system. Therefore, the latter may be successfully used in studies of the role of different genes and their products in recombination.  相似文献   

17.
A critically important challenge in empirical population genetics is distinguishing neutral nonequilibrium processes from selective forces that produce similar patterns of variation. We here examine the extent to which linkage disequilibrium (i.e., nonrandom associations between markers) improves this discrimination. We show that patterns of linkage disequilibrium recently proposed to be unique to hitchhiking models are replicated under nonequilibrium neutral models. We also demonstrate that jointly considering spatial patterns of association among variants alongside the site-frequency spectrum is nonetheless of value. Through a comparison of models of equilibrium neutrality, nonequilibrium neutrality, equilibrium hitchhiking, nonequilibrium hitchhiking, and recurrent hitchhiking, we evaluate a linkage disequilibrium (LD) statistic (omega(max)) that appears to have power to identify regions recently shaped by positive selection. Most notably, for demographic parameters relevant to non-African populations of Drosophila melanogaster, we demonstrate that selected loci are distinguishable from neutral loci using this statistic.  相似文献   

18.
Palindromic sequences are important DNA motifs related to gene regulation, DNA replication and recombination, and thus, investigating the evolutionary forces shaping the distribution pattern and abundance of palindromes in the genome is substantially important. In this article, we analyzed the abundance of palindromes in the genome, and then explored the possible effects of several genomic factors on the palindrome distribution and abundance in Drosophila melanogaster. Our results show that the palindrome abundance in D. melanogaster deviates from random expectation and the uneven distribution of palindromes across the genome is associated with local GC content, recombination rate, and coding exon density. Our data suggest that base composition is the major determinant of the distribution pattern and abundance of palindromes and the correlation between palindrome density and recombination is a side-product of the effect of compositional bias on the palindrome abundance.  相似文献   

19.
Mitochondrial DNA control region sequence variation was obtained and the population history of the common hippopotamus was inferred from 109 individuals from 13 localities covering six populations in sub-Saharan Africa. In all, 100 haplotypes were defined, of which 98 were locality specific. A relatively low overall nucleotide diversity was observed (pi = 1.9%), as compared to other large mammals so far studied from the same region. Within populations, nucleotide diversity varied from 1.52% in Zambia to 1.92% in Queen Elizabeth and Masai Mara. Overall, low but significant genetic differentiation was observed in the total data set (F(ST) = 0.138; P = 0.001), and at the population level, patterns of differentiation support previously suggested hippopotamus subspecies designations (F(CT) = 0.103; P = 0.015). Evidence that the common hippopotamus recently expanded were revealed by: (i) lack of clear geographical structure among haplotypes, (ii) mismatch distributions of pairwise differences (r = 0.0053; P = 0.012) and site-frequency spectra, (iii) Fu's neutrality statistics (F(S) = -155.409; P < 0.00001) and (iv) Fu and Li's statistical tests (D* = -3.191; P < 0.01, F* = -2.668; P = 0.01). Mismatch distributions, site-frequency spectra and neutrality statistics performed at subspecies level also supported expansion of Hippopotamus amphibius across Africa. We interpret observed common hippopotamus population history in terms of Pleistocene drainage overflow and suggest recognising the three subspecies that were sampled in this study as separate management units in future conservation planning.  相似文献   

20.
Revealing how recombination affects genomic sequence is of great significance to our understanding of genome evolution. The present paper focuses on the correlation between recombination rate and dinucleotide bias in Drosophila melanogaster genome. Our results show that the overall dinucleotide bias is positively correlated with recombination rate for genomic sequences including untranslated regions, introns, intergenic regions, and coding sequences. The correlation patterns of individual dinucleotide biases with recombination rate are presented. Possible mechanisms of interaction between recombination and dinucleotide bias are discussed. Our data indicate that there may be a genome-wide universal mechanism acting between recombination rate and dinucleotide bias, which is likely to be neighbor-dependent biased gene conversion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号