共查询到20条相似文献,搜索用时 46 毫秒
1.
We present a Moran-model approach to modeling general multiallelic selection in a finite population and show how it may be used to develop theoretical models of biological systems of balancing selection such as plant gametophytic self-incompatibility loci. We propose new expressions for the stationary distribution of allele frequencies under selection and use them to show that the continuous-time Markov chain describing allele frequency change with exchangeable selection and Moran-model reproduction is reversible. We then use the reversibility property to derive the expected allele frequency spectrum in a finite population for several general models of multiallelic selection. Using simulations, we show that our approach is valid over a broader range of parameters than previous analyses of balancing selection based on diffusion approximations to the Wright–Fisher model of reproduction. Our results can be applied to any model of multiallelic selection in which fitness is solely a function of allele frequency.NATURAL selection has long been a topic of interest in population genetics, yet the stochastic theory of genes under selection remains underdeveloped compared to the theory of neutral genes. Due to the interplay of stochastic and deterministic forces, models of selection present analytical challenges beyond those of neutral models, although a great deal of progress has been made with models that use diffusion approximations to a Wright–Fisher model of reproduction. Diffusion approximations with selection are, however, sometimes difficult to employ and always require assumptions about population parameters for tractability. These limitations suggest that there may be value in developing new methods of solving the problem of selection in a finite population, and here we do so using a Moran model of reproduction in place of the familiar Wright–Fisher model. Our approach has two major advantages over previous models: general applicability to a wide variety of selection models and accuracy over a broad range of parameter values. In this work, we propose new expressions for the full stationary distributions of allele frequencies under multiallelic selection, as well as expressions for average allele frequency distributions.We restrict our attention to exchangeable models of selection, meaning that relabeling the alleles will not change selective outcomes and thus that selection will be a function of allele frequency rather than allele identity. Many models of selection can be transformed into frequency-dependent forms (Denniston and Crow 1990), and some common models of selection have the desired property of exchangeability. For example, symmetric overdominant selection, in which heterozygotes have a selective advantage over homozygotes but the specific genotype of homozygote or heterozygote has no further selective effect, can be expressed as frequency-dependent selection on individual (exchangeable) alleles, although the direct selection is actually on diploid genotypes. Many other proposed models of multiallelic balancing selection, in which substantial variation is maintained by selection, can be viewed in this way. Such models have been of particular interest because of the potential application to highly multiallelic systems found in nature, such as self-incompatibility (SI) loci in plants and the major histocompatibility complex (MHC) loci in vertebrates, and the desire to analyze these systems is a motivation for the present work. We now review some of the population genetic theory related to these systems.Early in the history of population genetics, Wright (1939) presented a somewhat controversial stochastic model of gametophytic self-incompatibility (GSI) genes, sparking much further theoretical and empirical work. An analytic theory of multiallelic symmetric overdominance was developed along similar lines to this early model (Kimura and Crow 1964; Takahata 1990) and has been used as an approximation to the unknown mode of selection in the MHC (Takahata et al. 1992). Drawing insights from these first two applications, other biological systems where balancing selection was posited, including sex determination in honeybees (Yokoyama and Nei 1979), fungal mating systems (May et al. 1999), and heterokaryon incompatibility in fungi (Muirhead et al. 2002), have also been modeled successfully using closely related approaches. Progress has been made in using these models to address genealogical (Takahata 1990; Vekemans and Slatkin 1994) and demographic (Muirhead 2001) questions, as well as extending the models into more complex modes of selection (Uyenoyama 2003) and reproduction (Vallejo-Marin and Uyenoyama 2008).Models of genetic variation under balancing selection have traditionally been focused on specific systems, such that extensions require entirely new analyses, and have also included a number of simplifying assumptions in the interest of mathematical tractability. For example, the symmetric overdominance model has been strongly criticized as an unrealistic approximation of MHC evolution (Paterson et al. 1998; Hedrick 2002; Penn et al. 2002; Ilmonen et al. 2007; Stoffels and Spencer 2008), and yet it has proved difficult to make finite-population models of any of the more realistic frequency dependence schemes using the same approaches. A constraint on further progress is the fact that the standard model of stochastic population genetics, the Wright–Fisher model, is in fact quite difficult to analyze.The Wright–Fisher model of reproduction employs nonoverlapping generations, so that for a diploid population of size N, all 2N allele copies are chosen simultaneously when forming a new generation of individuals. While it is straightforward to describe this reproduction scheme mathematically as a discrete-time Markov chain, that chain unfortunately appears intractable even in simple cases (Ewens 2004). Traditionally, then, diffusion approximations have been used to obtain quantities of interest, such as the equilibrium expected number of alleles, allele frequency spectra, and fixation probabilities and times. Diffusion approximations are derived in the limit , but are applicable to problems of finite N, provided that the strengths of other forces such as mutation and selection can be assumed to be weak, of O(N−1) (Ewens 2004). Watterson (1977) derived such a diffusion approximation for multiallelic symmetric overdominance using these assumptions. More recently, as interest in population genetics has turned to problems of inference, Grote and Speed (2002) considered sampling probabilities under the diffusion approximation for symmetric overdominance, while Donnelly et al. (2001) and Stephens and Donnelly (2003) proposed computational methods for some asymmetric models.Although strong selection can be modeled using diffusion approximations by making the product of the population size and the selection coefficient (Ns) large, the assumption of weak selection is not in fact appropriate for the canonical biological systems of balancing selection. Specifically, selection coefficients are defined by the differences in fitness (the expected number of offspring) among individuals in the population at a given time. These differences may be large in systems such as GSI, where the fitness of a very common allele may be very small while the fitness of other alleles may be greater than one.In an attempt to deal with the extremely strong selection of gametophytic self-incompatibility, Wright''s (1939) original model focused attention on the dynamics of a single representative allele. He collapsed the influence of all other alleles into a single summary statistic: the homozygosity, F, which is a function of the frequencies of all alleles, and which Wright (1939) assumed to be constant. The analysis is essentially that of a two-allele system, using a one-dimensional diffusion analysis. This approach, while shown by simulation to be very effective in the appropriate parameter range (Ewens and Ewens 1966), received substantial criticism on mathematical grounds (Fisher 1958; Moran 1962; Ewens 1964b). Ewens (1964b), in particular, objected to the use of diffusion theory for GSI, pointing out that strong frequency-dependent selection violates the diffusion requirement that both the mean and the variance of the change in allele frequencies be small and of O(N−1). Ewens (1964a) then applied Wright''s basic one-dimensional diffusion approach to modeling symmetric overdominance, but assumed that selection was weak and of O(N−1) to stay within the strict limits of the diffusion approximation.Kimura and Crow (1964) and Wright (1966), on the other hand, presented alternative one-dimensional diffusion approximations to symmetric overdominance, closer in spirit to Wright''s original model of GSI, that did not make the weak-selection assumption. Watterson (1977) was concerned about both the inconsistencies of the approximations used in these models and the treatment of F as a constant rather than as a random variable dependent upon allele frequencies. Using his own multiallelic diffusion approximation for symmetric overdominance (Watterson 1977), he derived an alternative (small-Ns) approximation to the frequency of a single representative allele. We consider this approximation, as well as the best-known one-dimensional symmetric overdominance diffusion, the strong-selection approximation of Kimura and Crow (1964), in comparison with our alternative approach to deriving allele frequency spectra under general multiallelic selection with exchangeable alleles.To avoid the approximations required to employ Wright–Fisher/diffusion-based methods, we turn to an alternative model of reproduction in a finite population: the overlapping-generations model of Moran (1962). In the Moran model, a single allele copy dies and another reproduces in each time step, rather than all 2N allele copies simultaneously being replaced by offspring each generation. As in the Wright–Fisher model, this reproduction scheme is represented mathematically by a Markov chain. Unlike the Wright–Fisher model, however, the Moran model can sometimes yield tractable, exact solutions to the underlying Markov chain, without the need to resort to diffusion approximations. We exploit this trait to develop a new stochastic theory of multiallelic selection with minimal dependence on assumptions about population parameter values. Our method has the additional benefit of being flexible: it can accommodate any exchangeable model of multiallelic selection and either of two general models of parent-independent mutation, the infinite-alleles and k-allele models of mutation. Our Moran-model predictions agree well with the results of Wright–Fisher simulations. 相似文献
2.
Fluctuations in age structure caused by environmental stochasticity create autocorrelation and transient fluctuations in both population size and allele frequency, which complicate demographic and evolutionary analyses. Following a suggestion of Fisher, we show that weighting individuals of different age by their reproductive value serves as a filter, removing temporal autocorrelation in population demography and evolution due to stochastic age structure. Assuming weak selection, random mating, and a stationary distribution of environments with no autocorrelation, we derive a diffusion approximation for evolution of the reproductive value weighted allele frequency. The expected evolution obeys an adaptive topography defined by the long-run growth rate of the population. The expected fitness of a genotype is its Malthusian fitness in the average environment minus the covariance of its growth rate with that of the population. Simulations of the age-structured model verify the accuracy of the diffusion approximation. We develop statistical methods for measuring the expected selection on the reproductive value weighted allele frequency in a fluctuating age-structured population.THE evolutionary dynamics of age-structured populations were formalized by Charlesworth (1980, 1994) and Lande (1982) on the basis of earlier ideas of Fisher (1930, 1958), Medawar (1946, 1952), and Hamilton (1966), showing that the strength of selection on genes affecting the vital rates of survival or fecundity depends on their age of action (reviewed by de Jong 1994; Charlesworth 2000). Fisher defined the reproductive value of individuals of a given age as their expected contribution to future population growth, determined by the age-specific vital rates. This has the property that in a constant environment the total reproductive value in a population always increases at a constant rate. The total population size, however, undergoes transient fluctuations as the stable age distribution is approached, and the total population size only asymptotically approaches a constant growth rate (Caswell 2001).Environmental stochasticity creates continual fluctuations in age structure, producing temporal autocorrelation in population size and in allele frequencies, which seriously complicate demographic and evolutionary analyses. Fisher (1930, 1958, p. 35) suggested for analysis of genetic evolution that individuals should be weighted by their reproductive value to compensate for deviations from the stable age distribution. Here we apply this suggestion to study weak fluctuating selection in an age-structured population in a stochastic environment.One of the central conceptual paradigms of evolutionary biology was described by Wright (1932). His adaptive topography represents a population as a point on a surface of population mean fitness as a function of allele frequencies. Assuming weak selection, random mating, and loose linkage (implying approximate Hardy–Weinberg equilibrium within loci and linkage eqilibrium among loci), natural selection in a constant environment causes the population to evolve uphill of the mean fitness surface (Wright 1937, 1945, 1969; Arnold et al. 2001; Gavrilets 2004). Evolution by natural selection thus tends to increase the mean fitness of a population in a constant environment.Lande (2007, 2008) generalized Wright''s adaptive topography to a stochastic environment, allowing density-dependent population growth but assuming density-independent selection, showing that the expected evolution maximizes the long-run growth rate of the population at low density, . Here r is population growth rate at low density in the average environment and is the environmental variance in population growth rate among years, which are standard parameters in stochastic demography (Cohen 1977, 1979; Tuljapurkar 1982; Caswell 2001; Lande et al. 2003). In this model of stochastic evolution the adaptive topography describing the expected evolution is derived by expressing r and as functions of allele frequencies with parameters being the mean Malthusian fitnesses of the genotypes and their temporal variances and covariances. These results are based on diffusion approximations for the coupled stochastic processes of population size and allele frequencies in a fluctuating environment.Diffusion approximations are remarkably accurate for many problems in evolution and ecology (Crow and Kimura 1970; Lande et al. 2003). Because a diffusion process is subject to white noise with no temporal autocorrelation, the approximation is most accurate if the noise in the underlying biological process is approximately uncorrelated among years. Despite temporal autocorrelation in total population size produced by age-structure fluctuations, the stochastic demography of age-structured populations over timescales of a generation or more can nevertheless be accurately approximated by a diffusion process (Tuljapurkar 1982; Lande and Orzack 1988; Engen et al. 2005a, 2007). The success of the diffusion approximation for total population size occurs because the noise in the total reproductive value is nearly white, with no temporal autocorrelation to first order, and the log of total population size fluctuates around the log of reproductive value with a return time to equilibrium on the order of a few generations (Engen et al. 2007). Hence the diffusion approximation is well suited to describe the stochastic dynamics of total reproductive value as well as total population size.This article extends Lande''s (2008) model of fluctuating selection without age structure by deriving a diffusion approximation for the evolution of an age-structured population in a stochastic environment. Assuming weak selection at all ages, random mating, and a stationary distribution of environments with no temporal autocorrelation, we show that the main results of the model remain valid, provided that the model parameters are expressed in terms of means, variances, and covariances of age-specific vital rates and that allele frequencies are defined by weighting individuals of different age by their reproductive value, as suggested by Fisher (1930, 1958). We perform simulations to verify the accuracy of the diffusion approximation and outline statistical methods for estimating the expected selection acting on the reproductive value weighted allele frequency. 相似文献
3.
Genomic Consequences of Background Effects on scalloped Mutant Expressivity in the Wing of Drosophila melanogaster 下载免费PDF全文
Ian Dworkin Erin Kennerly David Tack Jennifer Hutchinson Julie Brown James Mahaffey Greg Gibson 《Genetics》2009,181(3):1065-1076
We have evaluated the extent to which SNPs identified by genomewide surveys as showing unusually high levels of population differentiation in humans have experienced recent positive selection, starting from a set of 32 nonsynonymous SNPs in 27 genes highlighted by the HapMap1 project. These SNPs were genotyped again in the HapMap samples and in the Human Genome Diversity Project–Centre d''Etude du Polymorphisme Humain (HGDP–CEPH) panel of 52 populations representing worldwide diversity; extended haplotype homozygosity was investigated around all of them, and full resequence data were examined for 9 genes (5 from public sources and 4 from new data sets). For 7 of the genes, genotyping errors were responsible for an artifactual signal of high population differentiation and for 2, the population differentiation did not exceed our significance threshold. For the 18 genes with confirmed high population differentiation, 3 showed evidence of positive selection as measured by unusually extended haplotypes within a population, and 7 more did in between-population analyses. The 9 genes with resequence data included 7 with high population differentiation, and 5 showed evidence of positive selection on the haplotype carrying the nonsynonymous SNP from skewed allele frequency spectra; in addition, 2 showed evidence of positive selection on unrelated haplotypes. Thus, in humans, high population differentiation is (apart from technical artifacts) an effective way of enriching for recently selected genes, but is not an infallible pointer to recent positive selection supported by other lines of evidence.IN the last 50,000–100,000 years (KY), humans have expanded from being a rare species confined to parts of Africa and the Levant to their current numbers of >6 billion with a worldwide distribution (Jobling et al. 2004). Paleontological and archaeological evidence suggests that key aspects of modern human behavior developed ∼100–50 KYA in Africa (Henshilwood et al. 2002) and behaviorally modern humans then expanded out of Africa ∼60–40 KYA (Mellars 2006). The physical and biological environments encountered outside Africa would have been very different from those inside and included climatic deterioration reaching a glacial maximum ∼20 KYA and subsequent amelioration that permitted the development of agricultural and pastoral lifestyles in multiple independent centers after ∼10 KYA. Neolithic lifestyles would have led to further changes including higher population densities, close contact with animals, and novel foods, in turn leading to new diseases (Jobling et al. 2004). It is likely that genetic adaptations accompanied many of these events.Adaptation, or positive natural selection, leaves an imprint on the pattern of genetic variation found in a population near the site of selection. This pattern can be identified by comparing the DNA variants in multiple individuals from the same and different populations and searching for signals such as unusually extended haplotypes (extended haplotype homozygosity, EHH) (Voight et al. 2006; Sabeti et al. 2007; Tang et al. 2007), high levels of population differentiation (International Hapmap Consortium 2005; Barreiro et al. 2008; Myles et al. 2008), or skewed allele frequency spectra (Carlson et al. 2005). These signals become detectable at different times after the start of selection and are all transient, being gradually eroded by both molecular processes such as mutation, recombination, or further selection and population processes such as migration or demographic fluctuations, with the survival order extended haplotypes < population differentiation < allele frequency spectra (Sabeti et al. 2006). The absolute timescales of survival are not well understood, but extended haplotype tests typically detect selection within the last 10 KY (Sabeti et al. 2006) while unusual allele frequency spectra may detect much older selection. For example, it has been suggested that the signal associated with the FOXP2 gene (Enard et al. 2002) may predate the modern human–Neanderthal split ∼300–400 KYA (Krause et al. 2007), although such an interpretation has been questioned (Coop et al. 2008). However, despite significant uncertainties and limitations, population-genetic analyses are well placed to provide insights into many of the important events within the timescale of recent human evolution.In principle, it should be possible to survey the genome for sites of selection and then interpret this catalog in the light of archaeological, climatic, and other records. Progress toward such a goal has, however, been limited: many factors can confound the detection of selection and only genotype data from previously ascertained SNPs, rather than full resequence data, have thus far been available throughout the whole genome. In practice, the strategy used has therefore been to search the genome for signals that can be detected in available genotype data, such as extended haplotypes or population differentiation, and evaluate the significance of the regions identified by comparing them with empirical distributions of the same statistic, models that incorporate information about the demography, or biological expectations (McVean and Spencer 2006). However, it remains unclear how effective this strategy is: What false positive and false negative rates are associated with its applications? Further evaluation is desirable.The International HapMap Project has carried out the highest-resolution study so far of genetic variation in a set of human populations. In an article published in 2005, genotypes of >1 million SNPs were reported from 270 individuals with ancestry from Africa (Yoruba in Ibadan, Nigeria: YRI), Europe (Utah residents with ancestry from northern and western Europe: CEU), China (Han Chinese in Beijing, China: CHB), and Japan (Japanese in Tokyo, Japan: JPT) (International HapMap Consortium 2005). This article highlighted 32 SNPs from 27 genes that showed particular evolutionary interest because of a combination of two factors: they were nonsynonymous, that is, they changed an amino acid within a protein-coding gene and thus were likely to alter biological function, and they also exhibited a high level of population differentiation equal to or exceeding that of rs2814778, a SNP that is associated with strong biological evidence for population-specific selection. This SNP underlies the FY*0 (Duffy blood group negative) phenotype; FY*0 homozygotes do not express the Duffy blood group antigen on red blood cells and are consequently highly resistant to infection by the malarial parasite, Plasmodium vivax. The *0 allele is nearly fixed in Africa and rare outside, and it is widely believed that this is due to selection for resistance to vivax malaria.However, a number of studies have emphasized that large differences in allele frequency between populations can arise without positive selection: for example, a highly differentiated SNP in the Neuregulin I gene was not accompanied by unusual patterns in adjacent SNPs (Gardner et al. 2007), and large frequency differences can be quite common in empirical data sets, particularly in comparisons between Africa or America and the rest of the world, where population bottlenecks and “allele surfing” may have occurred during the exit from and entrance to these continents, respectively (Hofer et al. 2009). We wished to measure the extent to which the high population differentiation observed at the 27 HapMap genes might have resulted from positive selection and the extent to which it reflected other origins such as demographic factors, chance, or errors. We therefore retyped the same SNPs in the HapMap samples and in a large additional set of human populations and applied alternative tests for selection, either based on long-range haplotypes or based on full resequence data. For the latter, sequence data for 5 of the genes were available from public sources, and four new data sets were generated for this project. We found that, while genotyping errors led to some artifactual high differentiation signals, population differentiation was a useful but by no means infallible guide to recent selection detected by other methods. 相似文献
4.
Jesse E. Taylor 《Genetics》2009,182(3):813-837
The genealogical consequences of within-generation fecundity variance polymorphism are studied using coalescent processes structured by genetic backgrounds. I show that these processes have three distinctive features. The first is that the coalescent rates within backgrounds are not jointly proportional to the infinitesimal variance, but instead depend only on the frequencies and traits of genotypes containing each allele. Second, the coalescent processes at unlinked loci are correlated with the genealogy at the selected locus; i.e., fecundity variance polymorphism has a genomewide impact on genealogies. Third, in diploid models, there are infinitely many combinations of fecundity distributions that have the same diffusion approximation but distinct coalescent processes; i.e., in this class of models, ancestral processes and allele frequency dynamics are not in one-to-one correspondence. Similar properties are expected to hold in models that allow for heritable variation in other traits that affect the coalescent effective population size, such as sex ratio or fecundity and survival schedules.THE population genetics of within-generation fecundity variance has been studied from two perspectives. Beginning with Wright (1938), several authors have investigated the relationship between the effective size of a panmictic population with seasonal reproduction and the variance of the number of offspring born to each adult within a season (Crow and Denniston 1988; Nunney 1993, 1996; Waples 2002; Hedrick 2005; Engen et al. 2007). Although the precise form of this relationship depends on other biological factors such as the mating system and the manner in which population regulation operates, each of these studies shows that the effective population size is a decreasing function of fecundity variance. Furthermore, provided that the variance and the coalescent effective population sizes coincide (Ewens 1982; Nordborg and Krone 2002; Sjodin et al. 2005), these results imply that both the rate at which neutral allele frequencies fluctuate from generation to generation and the rate at which lineages coalesce will be positively correlated with within-generation fecundity variance. For example, it has been suggested that the shallow genealogies that have been documented in many marine organisms are a consequence of the high variance of reproductive success in the recruitment sweepstakes operating in these species (Hedgecock 1994; Árnason 2004; Eldon and Wakeley 2006).These results hold in models in which all individuals have the same within-generation (or within-season) fecundity variance. However, the evolutionary genetics of populations that are polymorphic for alleles that influence demographic traits have also been investigated. The first results of this kind were derived by Gillespie (1974, 1975, 1977), who used diffusion theory to show that natural selection can act directly on within-generation fecundity variance in a haploid population with nonoverlapping generations. By studying a simple model of a population composed of two genotypes, say A1 and A2, Gillespie (1974) showed that the fluctuations in the frequency of allele A1 can be approximated by a diffusion process with the following drift and variance coefficients,where p is the frequency of A1, N is the number of adults, and 1 + μi and are the mean and the variance, respectively, of the number of offspring produced by an individual of type Ai. Most discussions of this class of models have focused on the fitness consequences of differences in fecundity variance, which are summarized by the drift coefficient, m(p), of the diffusion approximation. There are two main conclusions. The first is that because m(p) is an increasing function of the difference − , selection can favor alleles that reduce within-generation fecundity variance even if these have lower mean fecundity. Such variance–mean trade-offs can be interpreted as a kind of bet hedging and could explain the evolution of certain risk-spreading traits such as insect oviposition onto multiple host plants (Root and Kareiva 1986) or multiple mating by females (Sarhan and Kokko 2007). On the other hand, because the strength of selection on fecundity variance is inversely proportional to population size, selection for mean–variance trade-offs will usually be dominated by changes in mean fecundity. For this reason, it has been suggested that within-generation bet hedging will be favored only in very small populations (Seger and Brockman 1987; Hopper et al. 2003), although recent theoretical studies have shown that bet hedging can evolve under less restrictive conditions in subdivided populations (Shpak 2005; Lehmann and Balloux 2007; Shpak and Proulx 2007).Less consideration has been given to the diffusion coefficient, v(p), which differs from the familiar quadratic term, p(1 − p), of the Wright–Fisher diffusion. Because the variance effective population size of a monomorphic population depends on the fecundity variance, it is not surprising that v(p) has an additional dependence on the frequency of A1 whenever the two alleles have different offspring variances. However, as noted by Gillespie (1974), the relationship between allele frequency fluctuations and the allelic composition of the population is counterintuitive. For example, when p is close to 1, so that the population is composed mainly of A1-type individuals, the rate of allele frequency fluctuations is dominated by the variance of the A2 genotype. In particular, if we define the variance effective population size by the expression Np(1 − p)/v(p) (Ewens 1982), then not only is this quantity frequency dependent, but also it depends on the life history traits of the missing genotype whenever the population is fixed for one of the two alleles. In contrast, the coalescent effective population size of a monomorphic population depends only on the offspring distribution of the fixed allele. The discrepancy between these two quantities raises the following question: namely, How does fecundity variance polymorphism affect the statistical properties of the genealogy of a random sample of individuals?The answer to this question is of interest for several reasons. First, although the effects of selection on genealogies have received considerable attention (Przeworski et al. 1999; Williamson and Orive 2002; Barton and Etheridge 2004), little is known about the genealogical consequences of variation in traits that alter the coalescent rate. Extrapolating from models in which the effective population size varies under the control of external factors, we might expect the coalescent process in a model with fecundity variance polymorphism to be a stochastic time change of Kingman''s coalescent. However, the results derived in the next section show that this intuition is usually wrong. The second motivation is more practical. Even if changes in fecundity variance are usually controlled by selection on other traits, the existence of interspecific differences in fecundity variance suggests that there must be periods when populations are polymorphic for alleles that alter the fecundity variance. In these instances, it might be possible to use sequence data to identify the loci responsible for these changes, but to do so will require the development of methods that exploit patterns that are unique to models in which the effective population size depends on the genetic composition of the population. For example, whereas the effects of genetic hitchhiking are usually restricted to linked sites (Maynard Smith and Haigh 1974; Kim and Stephan 2002; Przeworski 2002; Przeworski et al. 2005), we will see later that selective sweeps by mutations that affect fecundity variance would have a genomewide impact on polymorphism.Kingman (1982a,b) showed that the genealogy of a sample of individuals from a panmictic, neutrally evolving population of constant size can be described by a simple stochastic process known as the coalescent (or Kingman''s coalescent). One of the most important properties of Kingman''s coalescent is that it is a Markov process, a fact that is heavily exploited in mathematical analyses and that also allows for efficient simulations of genealogies. Unfortunately, this property generally does not hold in populations composed of nonexchangeable individuals. For example, if there are selective differences between individuals, then although the genealogy of a sample of individuals can still be regarded as a stochastic process, selective interactions between individuals cause this process to also depend on the history of nonancestral lineages. The key to overcoming this difficulty is to embed the genealogical process in a larger process that does satisfy the Markov property. This can be done in two ways. One approach is to embed the coalescent tree within a graphical process called the ancestral selection graph (Krone and Neuhauser 1997; Neuhauser and Krone 1997; Donnelly and Kurtz 1999) in which lineages can either branch, giving rise to pairs of potential ancestors, or coalesce. The intuition behind this construction is that the effects of selection on the genealogy can be accounted for by keeping track of a pool of potential ancestors that includes lineages that have failed to persist due to being outcompeted by individuals of higher fitness. Because the branching rates are linear in the number of lineages, while the coalescence rates are quadratic, this process is certain to reach an ultimate ancestor in finite time. The process can be stopped at this time, and both the ancestral and the genotypic status of individual branches can be resolved by assigning random mutations to the graph and then traversing it from the root to the leaves.The second approach is due to Kaplan et al. (1988), who showed that the genealogical history of a sample of genes under selection can be represented by a structured coalescent process. Here we think of the population as being subdivided into several demes, or genetic backgrounds, consisting of individuals that share the same genotype at the selected locus. Because individuals with the same genotype are exchangeable, the rate of coalescence within a background depends only on the size of the background and the number of ancestral lineages sharing that genotype. In addition, mutations at the selected site will move lineages between backgrounds. To obtain a Markov process, we need to keep track of two kinds of information: (i) the types of the ancestral lineages and (ii) the frequencies of the alleles segregating at the selected locus. Fortunately, because one-dimensional diffusion processes are reversible with respect to their stationary distributions (i.e., the detailed balance conditions are satisfied), the ancestral process of allele frequencies at a locus segregating two alleles has the same law as the forward process. Subsequently, Hudson and Kaplan (1988) showed that the genealogy at a linked neutral locus can be described by a structured coalescent defined in terms of the genetic backgrounds at the selected locus; in this case, recombination between the selected and neutral loci can also move lineages between backgrounds.The objective of this article is to extend the structured coalescent to population genetic models in which within-generation fecundity variance is genotype dependent. (The genealogical consequences of polymorphism affecting between-generation fecundity variance will be described in a separate article.) In these models, exchangeability is violated not only by selective differences between individuals, but also by differences in life history traits that affect coalescent rates and allele frequency fluctuations. Nonetheless, because lineages are exchangeable within backgrounds, the coalescence and substitution rates can still be calculated conditional on the types of the lineages and the genetic composition of the population. In the next two sections, I derive structured coalescent processes that describe the genealogy at a neutral marker locus that is linked to a second locus (the “selected locus”) that affects fecundity variance. This is first done for a haploid model and then extended to a diploid model in which there may be both sex- and genotype-specific differences in fecundity variance. Results for both models are summarized in Rates
Transition Haploid model Diploid model n1μ1q/p n1μ1q/p n2μ2p/q n2μ2p/q n1rq n1rq n2rp n2rp