We present a Moran-model approach to modeling general multiallelic selection in a finite population and show how it may be used to develop theoretical models of biological systems of balancing selection such as plant gametophytic self-incompatibility loci. We propose new expressions for the stationary distribution of allele frequencies under selection and use them to show that the continuous-time Markov chain describing allele frequency change with exchangeable selection and Moran-model reproduction is reversible. We then use the reversibility property to derive the expected allele frequency spectrum in a finite population for several general models of multiallelic selection. Using simulations, we show that our approach is valid over a broader range of parameters than previous analyses of balancing selection based on diffusion approximations to the Wright–Fisher model of reproduction. Our results can be applied to any model of multiallelic selection in which fitness is solely a function of allele frequency.NATURAL selection has long been a topic of interest in population genetics, yet the stochastic theory of genes under selection remains underdeveloped compared to the theory of neutral genes. Due to the interplay of stochastic and deterministic forces, models of selection present analytical challenges beyond those of neutral models, although a great deal of progress has been made with models that use diffusion approximations to a Wright–Fisher model of reproduction. Diffusion approximations with selection are, however, sometimes difficult to employ and always require assumptions about population parameters for tractability. These limitations suggest that there may be value in developing new methods of solving the problem of selection in a finite population, and here we do so using a Moran model of reproduction in place of the familiar Wright–Fisher model. Our approach has two major advantages over previous models: general applicability to a wide variety of selection models and accuracy over a broad range of parameter values. In this work, we propose new expressions for the full stationary distributions of allele frequencies under multiallelic selection, as well as expressions for average allele frequency distributions.We restrict our attention to exchangeable models of selection, meaning that relabeling the alleles will not change selective outcomes and thus that selection will be a function of allele frequency rather than allele identity. Many models of selection can be transformed into frequency-dependent forms (
Denniston and Crow 1990), and some common models of selection have the desired property of exchangeability. For example, symmetric overdominant selection, in which heterozygotes have a selective advantage over homozygotes but the specific genotype of homozygote or heterozygote has no further selective effect, can be expressed as frequency-dependent selection on individual (exchangeable) alleles, although the direct selection is actually on diploid genotypes. Many other proposed models of multiallelic balancing selection, in which substantial variation is maintained by selection, can be viewed in this way. Such models have been of particular interest because of the potential application to highly multiallelic systems found in nature, such as self-incompatibility (SI) loci in plants and the major histocompatibility complex (MHC) loci in vertebrates, and the desire to analyze these systems is a motivation for the present work. We now review some of the population genetic theory related to these systems.Early in the history of population genetics,
Wright (1939) presented a somewhat controversial stochastic model of gametophytic self-incompatibility (GSI) genes, sparking much further theoretical and empirical work. An analytic theory of multiallelic symmetric overdominance was developed along similar lines to this early model (
Kimura and Crow 1964;
Takahata 1990) and has been used as an approximation to the unknown mode of selection in the MHC (
Takahata et al. 1992). Drawing insights from these first two applications, other biological systems where balancing selection was posited, including sex determination in honeybees (
Yokoyama and Nei 1979), fungal mating systems (
May et al. 1999), and heterokaryon incompatibility in fungi (
Muirhead et al. 2002), have also been modeled successfully using closely related approaches. Progress has been made in using these models to address genealogical (
Takahata 1990;
Vekemans and Slatkin 1994) and demographic (
Muirhead 2001) questions, as well as extending the models into more complex modes of selection (
Uyenoyama 2003) and reproduction (
Vallejo-Marin and Uyenoyama 2008).Models of genetic variation under balancing selection have traditionally been focused on specific systems, such that extensions require entirely new analyses, and have also included a number of simplifying assumptions in the interest of mathematical tractability. For example, the symmetric overdominance model has been strongly criticized as an unrealistic approximation of MHC evolution (
Paterson et al. 1998;
Hedrick 2002;
Penn et al. 2002;
Ilmonen et al. 2007;
Stoffels and Spencer 2008), and yet it has proved difficult to make finite-population models of any of the more realistic frequency dependence schemes using the same approaches. A constraint on further progress is the fact that the standard model of stochastic population genetics, the Wright–Fisher model, is in fact quite difficult to analyze.The Wright–Fisher model of reproduction employs nonoverlapping generations, so that for a diploid population of size
N, all 2
N allele copies are chosen simultaneously when forming a new generation of individuals. While it is straightforward to describe this reproduction scheme mathematically as a discrete-time Markov chain, that chain unfortunately appears intractable even in simple cases (
Ewens 2004). Traditionally, then, diffusion approximations have been used to obtain quantities of interest, such as the equilibrium expected number of alleles, allele frequency spectra, and fixation probabilities and times. Diffusion approximations are derived in the limit , but are applicable to problems of finite
N, provided that the strengths of other forces such as mutation and selection can be assumed to be weak, of
O(
N−1) (
Ewens 2004).
Watterson (1977) derived such a diffusion approximation for multiallelic symmetric overdominance using these assumptions. More recently, as interest in population genetics has turned to problems of inference,
Grote and Speed (2002) considered sampling probabilities under the diffusion approximation for symmetric overdominance, while
Donnelly et al. (2001) and S
tephens and
Donnelly (2003) proposed computational methods for some asymmetric models.Although strong selection can be modeled using diffusion approximations by making the product of the population size and the selection coefficient (
Ns) large, the assumption of weak selection is not in fact appropriate for the canonical biological systems of balancing selection. Specifically, selection coefficients are defined by the differences in fitness (the expected number of offspring) among individuals in the population at a given time. These differences may be large in systems such as GSI, where the fitness of a very common allele may be very small while the fitness of other alleles may be greater than one.In an attempt to deal with the extremely strong selection of gametophytic self-incompatibility,
Wright''s (1939) original model focused attention on the dynamics of a single representative allele. He collapsed the influence of all other alleles into a single summary statistic: the homozygosity,
F, which is a function of the frequencies of all alleles, and which
Wright (1939) assumed to be constant. The analysis is essentially that of a two-allele system, using a one-dimensional diffusion analysis. This approach, while shown by simulation to be very effective in the appropriate parameter range (
Ewens and Ewens 1966), received substantial criticism on mathematical grounds (
Fisher 1958;
Moran 1962;
Ewens 1964b).
Ewens (1964b), in particular, objected to the use of diffusion theory for GSI, pointing out that strong frequency-dependent selection violates the diffusion requirement that both the mean and the variance of the change in allele frequencies be small and of
O(
N−1).
Ewens (1964a) then applied Wright''s basic one-dimensional diffusion approach to modeling symmetric overdominance, but assumed that selection was weak and of
O(
N−1) to stay within the strict limits of the diffusion approximation.
Kimura and Crow (1964) and
Wright (1966), on the other hand, presented alternative one-dimensional diffusion approximations to symmetric overdominance, closer in spirit to Wright''s original model of GSI, that did not make the weak-selection assumption.
Watterson (1977) was concerned about both the inconsistencies of the approximations used in these models and the treatment of
F as a constant rather than as a random variable dependent upon allele frequencies. Using his own multiallelic diffusion approximation for symmetric overdominance (
Watterson 1977), he derived an alternative (small-
Ns) approximation to the frequency of a single representative allele. We consider this approximation, as well as the best-known one-dimensional symmetric overdominance diffusion, the strong-selection approximation of
Kimura and Crow (1964), in comparison with our alternative approach to deriving allele frequency spectra under general multiallelic selection with exchangeable alleles.To avoid the approximations required to employ Wright–Fisher/diffusion-based methods, we turn to an alternative model of reproduction in a finite population: the overlapping-generations model of
Moran (1962). In the Moran model, a single allele copy dies and another reproduces in each time step, rather than all 2
N allele copies simultaneously being replaced by offspring each generation. As in the Wright–Fisher model, this reproduction scheme is represented mathematically by a Markov chain. Unlike the Wright–Fisher model, however, the Moran model can sometimes yield tractable, exact solutions to the underlying Markov chain, without the need to resort to diffusion approximations. We exploit this trait to develop a new stochastic theory of multiallelic selection with minimal dependence on assumptions about population parameter values. Our method has the additional benefit of being flexible: it can accommodate any exchangeable model of multiallelic selection and either of two general models of parent-independent mutation, the infinite-alleles and
k-allele models of mutation. Our Moran-model predictions agree well with the results of Wright–Fisher simulations.
相似文献