首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
Innan H 《Genetics》2003,163(2):803-810
The infinite-site model of a small multigene family with two duplicated genes is studied. The expectations of the amounts of nucleotide variation within and between two genes and linkage disequilibrium are obtained, and a coalescent-based method for simulating patterns of polymorphism in a small multigene family is developed. The pattern of DNA variation is much more complicated than that in a single-copy gene, which can be simulated by the standard coalescent. Using the coalescent simulation of duplicated genes, the applicability of statistical tests of neutrality to multigene families is considered.  相似文献   

2.
Zeng K  Charlesworth B 《Genetics》2011,189(1):251-266
Background selection, the effects of the continual removal of deleterious mutations by natural selection on variability at linked sites, is potentially a major determinant of DNA sequence variability. However, the joint effects of background selection and genetic recombination on the shape of the neutral gene genealogy have proved hard to study analytically. The only existing formula concerns the mean coalescent time for a pair of alleles, making it difficult to assess the importance of background selection from genome-wide data on sequence polymorphism. Here we develop a structured coalescent model of background selection with recombination and implement it in a computer program that efficiently generates neutral gene genealogies for an arbitrary sample size. We check the validity of the structured coalescent model against forward-in-time simulations and show that it accurately captures the effects of background selection. The model produces more accurate predictions of the mean coalescent time than the existing formula and supports the conclusion that the effect of background selection is greater in the interior of a deleterious region than at its boundaries. The level of linkage disequilibrium between sites is elevated by background selection, to an extent that is well summarized by a change in effective population size. The structured coalescent model is readily extendable to more realistic situations and should prove useful for analyzing genome-wide polymorphism data.  相似文献   

3.
Davies JL  Simancík F  Lyngsø R  Mailund T  Hein J 《Genetics》2007,177(4):2151-2160
Coalescent theory deals with the dynamics of how sampled genetic material has spread through a population from a single ancestor over many generations and is ubiquitous in contemporary molecular population genetics. Inherent in most applications is a continuous-time approximation that is derived under the assumption that sample size is small relative to the actual population size. In effect, this precludes multiple and simultaneous coalescent events that take place in the history of large samples. If sequences do not recombine, the number of sequences ancestral to a large sample is reduced sufficiently after relatively few generations such that use of the continuous-time approximation is justified. However, in tracing the history of large chromosomal segments, a large recombination rate per generation will consistently maintain a large number of ancestors. This can create a major disparity between discrete-time and continuous-time models and we analyze its importance, illustrated with model parameters typical of the human genome. The presence of gene conversion exacerbates the disparity and could seriously undermine applications of coalescent theory to complete genomes. However, we show that multiple and simultaneous coalescent events influence global quantities, such as total number of ancestors, but have negligible effect on local quantities, such as linkage disequilibrium. Reassuringly, most applications of the coalescent model with recombination (including association mapping) focus on local quantities.  相似文献   

4.
A 3.5-kb segment of the alcohol dehydrogenase (Adh) region that includes the Adh and Adh-related genes was sequenced in 139 Drosophila pseudoobscura strains collected from 13 populations. The Adh gene encodes four protein alleles and rejects a neutral model of protein evolution with the McDonald-Kreitman test, although the number of segregating synonymous sites is too high to conclude that adaptive selection has operated. The Adh-related gene encodes 18 protein haplotypes and fails to reject an equilibrium neutral model. The populations fail to show significant geographic differentiation of the Adh-related haplotypes. Eight of 404 single nucleotide polymorphisms (SNPs) in the Adh region were in significant linkage disequilibrium with three ADHR protein alleles. Coalescent simulations with and without recombination were used to derive the expected levels of significant linkage disequilibrium between SNPs and 18 protein haplotypes. Maximum levels of linkage disequilibrium are expected for protein alleles at moderate frequencies. In coalescent models without recombination, linkage disequilibrium decays between SNPs and high frequency haplotypes because common alleles mutate to haplotypes that are rare or that reach moderate frequency. The implication of this study is that linkage disequilibrium mapping has the highest probability of success with disease-causing alleles at frequencies of 10%.  相似文献   

5.
Current human sequencing projects observe an abundance of extremely rare genetic variation, suggesting recent acceleration of population growth. To better understand the impact of such accelerating growth on the quantity and nature of genetic variation, we present a new class of models capable of incorporating faster than exponential growth in a coalescent framework. Our work shows that such accelerated growth affects only the population size in the recent past and thus large samples are required to detect the models’ effects on patterns of variation. When we compare models with fixed initial growth rate, models with accelerating growth achieve very large current population sizes and large samples from these populations contain more variation than samples from populations with constant growth. This increase is driven almost entirely by an increase in singleton variation. Moreover, linkage disequilibrium decays faster in populations with accelerating growth. When we instead condition on current population size, models with accelerating growth result in less overall variation and slower linkage disequilibrium decay compared to models with exponential growth. We also find that pairwise linkage disequilibrium of very rare variants contains information about growth rates in the recent past. Finally, we demonstrate that models of accelerating growth may substantially change estimates of present-day effective population sizes and growth times.  相似文献   

6.
To model deviations from selectively neutral genetic variation caused by different forms of selection, it is necessary to first understand patterns of neutral variation. Best understood is neutral genetic variation at a single locus. But, as is well known, additional insights can be gained by investigating multiple loci. The resulting patterns reflect the degree of association (linkage) between loci and provide information about the underlying multilocus gene genealogies. The statistical properties of two-locus gene genealogies have been intensively studied for populations of constant size, as well as for simple demographic histories such as exponential population growth and single bottlenecks. By contrast, the combined effect of recombination and sustained demographic fluctuations is poorly understood. Addressing this issue, we study a two-locus Wright-Fisher model of a population subject to recurrent bottlenecks. We derive coalescent approximations for the covariance of the times to the most recent common ancestor at two loci in samples of two chromosomes. This covariance reflects the degree of association and thus linkage disequilibrium between these loci. We find, first, that an effective population-size approximation describes the numerically observed association between two loci provided that recombination occurs either much faster or much more slowly than the population-size fluctuations. Second, when recombination occurs frequently between but rarely within bottlenecks, we observe that the association of gene histories becomes independent of physical distance over a certain range of distances. Third, we show that in this case, a commonly used measure of linkage disequilibrium, σ(2)(d) (closely related to r(2)), fails to capture the long-range association between two loci. The reason is that constituent terms, each reflecting the long-range association, cancel. Fourth, we analyze a limiting case in which the long-range association can be described in terms of a Xi coalescent allowing for simultaneous multiple mergers of ancestral lines.  相似文献   

7.
In this paper we develop a coalescent model with intralocus gene conversion. Such models are of increasing importance in the analysis of intralocus variability and linkage disequilibrium. We derive the distribution of the waiting time until a gene conversion event occurs in a sample in terms of the distribution of the length of the transferred segment, zeta. We do not assume any specific form of the distribution of zeta. Further, given that a gene conversion event occurs we find the distribution of (sigma, tau), the end points of the transferred segment and derive results on correlations between local trees in positions chi(1) and chi(2). Among other results we show that the correlation between the branch lengths of two local trees in the coalescent with gene conversion (and no recombination) decreases toward a nonzero constant when the distance between chi(1) and chi(2) increases. Finally, we show that a model including both recombination and gene conversion might account for the lack of intralocus associations found in, e.g., Drosophila melanogaster.  相似文献   

8.
Observed linkage disequilibrium (LD) between genetic markers in different populations descended independently from a common ancestral population can be used to estimate their absolute time of divergence, because the correlation of LD between populations will be reduced each generation by an amount that, approximately, depends only on the recombination rate between markers. Although drift leads to divergence in allele frequencies, it has less effect on divergence in LD values. We derived the relationship between LD and time of divergence and verified it with coalescent simulations. We then used HapMap Phase II data to estimate time of divergence between human populations. Summed over large numbers of pairs of loci, we find a positive correlation of LD between African and non-African populations at levels of up to ~0.3 cM. We estimate that the observed correlation of LD is consistent with an effective separation time of approximately 1,000 generations or ~25,000 years before present. The most likely explanation for such relatively low separation times is the existence of substantial levels of migration between populations after the initial separation. Theory and results from coalescent simulations confirm that low levels of migration can lead to a downward bias in the estimate of separation time.  相似文献   

9.
A way to identify loci subject to positive selection is to detect the signature of selective sweeps in given chromosomal regions. It is revealed by the departure of DNA polymorphism patterns from the neutral equilibrium predicted by coalescent theory. We surveyed DNA sequence variation in a region formerly identified as causing "sex-ratio" meiotic drive in Drosophila simulans. We found evidence that this system evolved by positive selection at 2 neighboring loci, which thus appear to be required simultaneously for meiotic drive to occur. The 2 regions are approximately 150-kb distant, corresponding to a genetic distance of 0.1 cM. The presumably large transmission advantage of chromosomes carrying meiotic drive alleles at both loci has not erased the individual signature of selection at each locus. This chromosome fragment combines a high level of linkage disequilibrium between the 2 critical regions with a high recombination rate. As a result, 2 characteristic traits of selective sweeps--the reduction of variation and the departure from selective neutrality in haplotype tests--show a bimodal pattern. Linkage disequilibrium level indicates that, in the natural population from Madagascar used in this study, the selective sweep may be as recent as 100 years.  相似文献   

10.
Classical population genetics describes how the fate of an allele is driven by four forces: mutation, migration, selection and drift. However, these are sometimes insufficient to explain how the observed allele frequency changes and, therefore, another factor must be invoked: cultural transmission of fitness (CTF). CTF is the non-genetic transmission of any kind of behaviour that affects reproductive success. There are several clearly documented examples of CTF, and theoretical studies have shown that it affects effective population size, linkage disequilibrium and coalescent times. It is therefore a factor that must be taken into account to explain the structure of genetic diversity. In this article, we will present documented cases of how CTF affects the genetic diversity of populations and yields dramatic changes in allele frequencies.  相似文献   

11.
We present a multilocus gene mapping method based on linkage disequilibrium, which uses the ancestral recombination graph to model the history of sequences that may harbor an influential variant. We describe the construction of a recurrence equation used to make inferences about the location of a trait-influencing mutation. We demonstrate how a Monte Carlo algorithm combined with a local importance sampling scheme can be used for mapping. We explain how to simulate the timing of events in the coalescent in the presence of recombination and mutation, which accomodates variable population size. We provide an example to illustrate the use of the method, which can be easily extended to more general situations. Although the method is computationally intensive and variation in the likelihood profiles can occur, the method offers a great deal of promise.  相似文献   

12.
The Kingman coalescent, which has become the foundation for a wide range of theoretical as well as empirical studies, was derived as an approximation of the Wright-Fisher (WF) model. The approximation heavily relies on the assumption that population size is large and sample size is much smaller than the population size. Whether the sample size is too large compared to the population size is rarely questioned in practice when applying statistical methods based on the Kingman coalescent. Since WF model is the most widely used population genetics model for reproduction, it is desirable to develop a coalescent framework for the WF model, which can be used whenever there are concerns about the accuracy of the Kingman coalescent as an approximation. This paper described the exact coalescent theory for the WF model and develops a simulation algorithm, which is then used, together with an analytical approach, to study the properties of the exact coalescent as well as its differences to the Kingman coalescent. We show that the Kingman coalescent differs from the exact coalescent by: (1) shorter waiting time between successive coalescent events; (2) different probability of observing a topological relationship among sequences in a sample; and (3) slightly smaller tree length in the genealogy of a large sample. On the other hand, there is little difference in the age of the most recent common ancestor (MRCA) of the sample. The exact coalescent makes up the longer waiting time between successive coalescent events by having multiple coalescence at the same time. The most significant difference among various summary statistics of a coalescent examined is the sum of lengths of external branches, which can be more than 10% larger for exact coalescent than that for the Kingman coalescent. As a whole, the Kingman coalescent is a remarkably accurate approximation to the exact coalescent for sample and population sizes falling considerably outside the region that was originally anticipated.  相似文献   

13.
14.
The causal relationship between genes and diseases has been investigated with the development of DNA sequence. Polymorphisms incorporated in the HapMap Project have enabled fine mapping with linkage disequilibrium (LD) and prior clustering of the haplotypes on the basis of a similarity measure has often been performed in an attempt to capture coalescent events because they can reduce the amount of computation. However an inappropriate choice of similarity measure can lead to wrong conclusions and we propose a new haplotype-based clustering algorithm for fine-scale mapping by using a Bayesian partition model. To handle phase-unknown genotypes, we propose a new algorithm based on a Metropolized Gibbs sampler and it is implemented in C++. Our simulation studies found that the proposed method improves the accuracy of the estimator for the disease susceptibility locus. We illustrated the practical implication of the new analysis method by an application to fine-scale mapping of CYP2D6 in drug metabolism.  相似文献   

15.
Navarro A  Barton NH 《Genetics》2002,161(2):849-863
We studied the effect of multilocus balancing selection on neutral nucleotide variability at linked sites by simulating a model where diallelic polymorphisms are maintained at an arbitrary number of selected loci by means of symmetric overdominance. Different combinations of alleles define different genetic backgrounds that subdivide the population and strongly affect variability. Several multilocus fitness regimes with different degrees of epistasis and gametic disequilibrium are allowed. Analytical results based on a multilocus extension of the structured coalescent predict that the expected linked neutral diversity increases exponentially with the number of selected loci and can become extremely large. Our simulation results show that although variability increases with the number of genetic backgrounds that are maintained in the population, it is reduced by random fluctuations in the frequencies of those backgrounds and does not reach high levels even in very large populations. We also show that previous results on balancing selection in single-locus systems do not extend to the multilocus scenario in a straightforward way. Different patterns of linkage disequilibrium and of the frequency spectrum of neutral mutations are expected under different degrees of epistasis. Interestingly, the power to detect balancing selection using deviations from a neutral distribution of allele frequencies seems to be diminished under the fitness regime that leads to the largest increase of variability over the neutral case. This and other results are discussed in the light of data from the Mhc.  相似文献   

16.
Probabilities of monophyly, paraphyly, and polyphyly of two-species gene genealogies are computed for modest sample sizes and compared for two different Λ coalescent processes. Coalescent processes belonging to the Λ coalescent family admit asynchronous multiple mergers of active ancestral lineages. Assigning a timescale to the time of divergence becomes a central issue when different populations have different coalescent processes running on different timescales. Clade probabilities in single populations are also computed, which can be useful for testing for taxonomic distinctiveness of an observed set of monophyletic lineages. The coalescence rates of multiple merger coalescent processes are functions of coalescent parameters. The effect of coalescent parameters on the probabilities studied depends on the coalescent process, and if the population is ancestral or derived. The probability of reciprocal monophyly tends to be somewhat lower, when associated with a Λ coalescent, under the null hypothesis that two groups come from the same population. However, even for fairly recent divergence times, the probability of monophyly tends to be higher as a function of the number of generations for coalescent processes that admit multiple mergers, and is sensitive to the parameter of one of the example processes.  相似文献   

17.
Arabidopsis thaliana is a highly selfing plant that nevertheless appears to undergo substantial recombination. To reconcile its selfing habit with the observations of recombination, we have sampled the genetic diversity of A. thaliana at 14 loci of approximately 500 bp each, spread across 170 kb of genomic sequence centered on a QTL for resistance to herbivory. A total of 170 of the 6321 nucleotides surveyed were polymorphic, with 169 being biallelic. The mean silent genetic diversity (pi(s)) varied between 0.001 and 0.03. Pairwise linkage disequilibria between the polymorphisms were negatively correlated with distance, although this effect vanished when only pairs of polymorphisms with four haplotypes were included in the analysis. The absence of a consistent negative correlation between distance and linkage disequilibrium indicated that gene conversion might have played an important role in distributing genetic diversity throughout the region. We tested this by coalescent simulations and estimate that up to 90% of recombination is due to gene conversion.  相似文献   

18.
The coalescent with gene conversion   总被引:7,自引:0,他引:7  
Wiuf C  Hein J 《Genetics》2000,155(1):451-462
In this article we develop a coalescent model with intralocus gene conversion. The distribution of the tract length is geometric in concordance with results published in the literature. We derive a simulation scheme and deduce a number of analytical results for this coalescent with gene conversion. We compare patterns of variability in samples simulated according to the coalescent with recombination with similar patterns simulated according to the coalescent with gene conversion alone. Further, an expression for the expected number of topology shifts in a sample of present-day sequences caused by gene conversion events is derived.  相似文献   

19.
This work studies the coalescent (ancestral pedigree, genealogy) of the entire population. The coalescent structure (topology) is robust, but selection changes the rate of coalescence (the time between branching events). The change in the rate of coalescence is not uniform, rather the reduction in the time between branching events is greatest when the coalescent is small (immediately after the common ancestor is the only member of the coalescent) with little change when the coalescent is large (immediately preceding when that common ancestor becomes fixed and the size of the coalescent is N). This provides that the reduction in the coalescent time due to selection is much greater than the reduction in the cumulative size of the coalescent (total number of ancestors of the present population after and including the most recent common ancestor) due to selection. If Ns≫1, the coalescent and fixation times are approximately equal to , which is much less than the value N which would result from neutral drift (N rather than the canonical haploid neutral fixation time 2N is the appropriate comparison for the model considered here), in particular, it is 70% less for Ns=10 and 95% less for Ns=100. However, for those values of Ns, and N ranging between 103 and 106, the reduction in the cumulative size of the coalescent of the entire population compared to the neutral case ranges from 17% to 65% (depending on the values of N and s). The coalescent time for two individuals for Ns of 10 and 100 is reduced by approximately 70% and 94%, respectively, compared with the neutral case. Because heterozygosity is proportional to the coalescent time for two individuals and the number of segregating alleles is proportional to the cumulative size of the coalescent, selection reduces heterozygosity more than it reduces the number of segregating alleles.  相似文献   

20.
Conventional coalescent inferences of population history make the critical assumption that the population under examination is panmictic. However, most populations are structured. This complicates the prevailing coalescent analyses and sometimes leads to inaccurate estimates. To develop a coalescent method unhampered by population structure, we perform two analyses. First, we demonstrate that the coalescent probability of two randomly sampled alleles from the immediate preceding generation(one generation back)is independent of population structure. Second, motivated by this finding, we propose a new coalescent method: i-coalescent analysis. The i-coalescent analysis computes the instantaneous coalescent rate by using a phylogenetic tree of sampled alleles. Using simulated data, we broadly demonstrate the capability of i-coalescent analysis to accurately reconstruct population size dynamics of highly structured populations, although we find this method often requires larger sample sizes for structured populations than for panmictic populations. Overall, our results indicate i-coalescent analysis to be a useful tool, especially for the inference of population histories with intractable structure such as the developmental history of cell populations in the organs of complex organisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号