首页 | 本学科首页   官方微博 | 高级检索  
   检索      


The Total Branch Length of Sample Genealogies in Populations of Variable Size
Authors:A Eriksson  B Mehlig  M Rafajlovic  S Sagitov
Institution:*Department of Physics, University of Gothenburg, SE-41296 Gothenburg, Sweden, Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, United Kingdom and Mathematical Sciences, Chalmers University of Technology and University of Gothenburg, SE-41296 Gothenburg, Sweden
Abstract:We consider neutral evolution of a large population subject to changes in its population size. For a population with a time-variable carrying capacity we study the distribution of the total branch lengths of its sample genealogies. Within the coalescent approximation we have obtained a general expression—Equation 20—for the moments of this distribution with a given arbitrary dependence of the population size on time. We investigate how the frequency of population-size variations alters the total branch length.MODELS for gene genealogies of biological populations often assume a constant, time-independent population size N. This is the case for the Wright–Fisher model (Fisher 1930; Wright 1931), for the Moran model (Moran 1958), and for their representation in terms of the coalescent (Kingman 1982). In real biological populations, by contrast, the population size changes over time. Such fluctuations may be due to catastrophic events (bottlenecks) and subsequent population expansions or just reflect the randomness in the factors determining the population dynamics. Many authors have argued that genetic variation in a population subject to size fluctuations may nevertheless be described by the Wright–Fisher model, if one replaces the constant population size in this model by an effective population size of the form(1)where Nl stands for the population size in generation l. The harmonic average in Equation 1 is argued to capture the significant effect of catastrophic events on patterns of genetic variation in a population: if, for example, a population went through a recent bottleneck, a large fraction of individuals in a given sample would originate from few parents. This in turn would lead to significantly reduced genetic variation, parameterized by a small value of Neff. (See, e.g., Ewens 1982 for a review of different measures of the effective population size and Sjödin et al. 2005 and Wakeley and Sargsyan 2009 for recent developments of this concept.)The concept of an effective population size has been frequently used in the literature, implicitly assuming that the distribution of neutral mutations in a large population of fluctuating size is identical to the distribution in a Wright–Fisher model with the corresponding constant effective population size given by Equation 1. However, recently it was shown that this is true only under certain circumstances (Kaj and Krone 2003; Nordborg and Krone 2003; Jagers and Sagitov 2004). It is argued by Sjödin et al. (2005) that the concept of an effective population size is appropriate when the timescale of fluctuations of Nl is either much smaller or much larger than the typical time between coalescent events in the sample genealogy. In these limits it can be proved that the distribution of the sample genealogies is exactly given by that of the coalescent with a constant, effective population size.More importantly, it follows from these results that, in populations with variable size, the coalescent with a constant effective population size is not always a valid approximation for the sample genealogies. Deviations between the predictions of the standard coalescent model and empirical data are frequently observed, and there are a number of different statistical tests quantifying the corresponding discrepancies (see, for example, Tajima 1989, Fu and Li 1993, and Zeng et al. 2006). The analysis of such deviations is of crucial importance in understanding, for example, human genetic history (Garrigan and Hammer 2006). But while there is a substantial amount of work numerically quantifying deviations, often in terms of a single number, little is known about their qualitative origins and their effect upon summary statistics in the population in question.The question is thus to understand the effect of population-size fluctuations on the patterns of genetic variation, in particular for the case where the scale of the population-size fluctuations is comparable to the time between coalescent events in the ancestral tree. As is well known, many empirical measures of genetic variation can be computed from the total branch length of the sample genealogy (the expected number of single-nucleotide polymorphisms, for example, is proportional to the average total branch length).The aim of this article is to analyze the distribution of the scaled total branch length Tn for a sample genealogy in a population of fluctuating size, as illustrated in Figure 1. For the genealogy of n ≥ 2 lineages sampled at the present time, the expression ⌊NTn⌋ gives the total branch length in terms of generations. Here ⌊Nt⌋ is the largest integer ≤Nt, and the scaling factor N is a suitable measure of the number of genes in the population and serves as a counterpart of the constant generation size of the standard Wright–Fisher model.Open in a separate windowFigure 1.—The effect of population-size oscillations on the genealogy of a sample of size n = 17 (schematic). Left, genealogy described by Kingman''s coalescent for a large population of constant size, illustrated by the light blue rectangle; right, sinusoidally varying population size. Coalescence is accelerated in regions of small population sizes and vice versa. This significantly alters the tree and gives rise to changes in the distribution of the number of mutations and of the population homozygosity.A motivating example is given in Figure 2, which shows numerically computed distributions ρ(Tn) of the total branch lengths Tn for a particular population model with a time-dependent carrying capacity. The model is described briefly in the Figure 2 legend and in detail in a model for a population with time-dependent carrying capacity. As Figure 2 shows, the distributions depend in a complex manner on the form of the size changes. We observe that when the frequency of the population-size fluctuations is very small (Figure 2a), the distribution is well described by the standard coalescent result(2)(Hein et al. 2005). When the frequency is very large (Figure 2e), Equation 2 also applies, but with a different time scaling reflecting an effective population size: t on the right-hand side (rhs) in Equation 2 is replaced by t/c with c = N/Neff. Apart from these special limits, however, the form of the distributions appears to depend in a complicated manner upon the frequency of the population-size variation. The observed behavior is caused by the fact that coalescence proceeds faster for smaller population sizes and more slowly for larger population sizes, as illustrated in Figure 1. But the question is how to quantitatively account for the changes shown in Figure 2.Open in a separate windowFigure 2.—Numerically computed distributions of the scaled total branch lengths Tn in genealogies of samples of size n = 10. The model employed in the simulations is outlined in a model for a population with time-dependent carrying capacity. It describes a population subject to a time-varying carrying capacity, Kl = K0(1 + ɛ sin(2πνl)). The frequency of the time changes is determined by ν, and l = 1, 2, 3, … labels discrete generations forward in time. The parameter N = K0 describes the typical population size, which is taken here to be equal to the time-averaged carrying capacity. a–e show for populations with increasingly rapidly oscillating carrying capacity. The dashed red line in a shows that in the limit of low frequencies the standard coalescent result, Equation 2, is obtained. The dashed red line in e shows that also in the limit of large frequencies the standard coalescent result is obtained, but now with an effective population size. The dashed red line in d is a two-parameter distribution, Equation 41, derived in comparison between numerical simulations and coalescent predictions. Further numerical and analytical results on the frequency dependence of the moments of these distributions are shown in Figure 4. Parameter values used: K0 = 10,000, ɛ = 0.9, and r = 1 (see a model for a population with time-dependent carrying capacity for the exact meaning of the intrinsic growth rate r) and (a) νN = 0.001, (b) νN = 0.1, (c) νN = 0.316, (d) νN = 1, and (e) νN = 100.We show in this article that the results of the simulations displayed in Figure 2 are explained by a general expression—Equation 20—for the moments of the distributions shown in Figure 2. Our general result is obtained within the coalescent approximation valid in the limit of large population size. But we find that in most cases, the coalescent approximation works very well down to small population sizes (a few hundred individuals). Our result enables us to understand and quantitatively describe how the distributions shown in Figure 2 depend upon the frequency of the population-size oscillations. It makes possible to determine, for example, how the variance, skewness, and the kurtosis of these distributions depend upon the frequency of demographic fluctuations. This in turn allows us to compute the population homozygosity and to characterize genetic variation in populations with size fluctuations.The remainder of this article is organized as follows. The next section summarizes our analytical results for the moments of the total branch length. Following that, we describe the model employed in the computer simulations. Then, corresponding numerical results are compared to the analytical predictions. And finally, we summarize how population-size fluctuations influence the distribution of total branch lengths and conclude with an outlook.
Keywords:
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号