首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The Kingman coalescent and its developments are often considered among the most important advances in population genetics of the last decades. Demographic inference based on coalescent theory has been used to reconstruct the population dynamics and evolutionary history of several species, including Mycobacterium tuberculosis (MTB), an important human pathogen causing tuberculosis. One key assumption of the Kingman coalescent is that the number of descendants of different individuals does not vary strongly, and violating this assumption could lead to severe biases caused by model misspecification. Individual lineages of MTB are expected to vary strongly in reproductive success because 1) MTB is potentially under constant selection due to the pressure of the host immune system and of antibiotic treatment, 2) MTB undergoes repeated population bottlenecks when it transmits from one host to the next, and 3) some hosts show much higher transmission rates compared with the average (superspreaders).Here, we used an approximate Bayesian computation approach to test whether multiple-merger coalescents (MMC), a class of models that allow for large variation in reproductive success among lineages, are more appropriate models to study MTB populations. We considered 11 publicly available whole-genome sequence data sets sampled from local MTB populations and outbreaks and found that MMC had a better fit compared with the Kingman coalescent for 10 of the 11 data sets. These results indicate that the null model for analyzing MTB outbreaks should be reassessed and that past findings based on the Kingman coalescent need to be revisited.  相似文献   

2.
The Genealogy of Samples in Models with Selection   总被引:1,自引:0,他引:1  
C. Neuhauser  S. M. Krone 《Genetics》1997,145(2):519-534
We introduce the genealogy of a random sample of genes taken from a large haploid population that evolves according to random reproduction with selection and mutation. Without selection, the genealogy is described by Kingman''s well-known coalescent process. In the selective case, the genealogy of the sample is embedded in a graph with a coalescing and branching structure. We describe this graph, called the ancestral selection graph, and point out differences and similarities with Kingman''s coalescent. We present simulations for a two-allele model with symmetric mutation in which one of the alleles has a selective advantage over the other. We find that when the allele frequencies in the population are already in equilibrium, then the genealogy does not differ much from the neutral case. This is supported by rigorous results. Furthermore, we describe the ancestral selection graph for other selective models with finitely many selection classes, such as the K-allele models, infinitely-many-alleles models, DNA sequence models, and infinitely-many-sites models, and briefly discuss the diploid case.  相似文献   

3.
Knudsen B  Miyamoto MM 《Genetics》2007,176(4):2335-2342
Coalescent theory provides a powerful framework for estimating the evolutionary, demographic, and genetic parameters of a population from a small sample of individuals. Current coalescent models have largely focused on population genetic factors (e.g., mutation, population growth, and migration) rather than on the effects of experimental design and error. This study develops a new coalescent/mutation model that accounts for unobserved polymorphisms due to missing data, sequence errors, and multiple reads for diploid individuals. The importance of accommodating these effects of experimental design and error is illustrated with evolutionary simulations and a real data set from a population of the California sea hare. In particular, a failure to account for sequence errors can lead to overestimated mutation rates, inflated coalescent times, and inappropriate conclusions about the population. This current model can now serve as a starting point for the development of newer models with additional experimental and population genetic factors. It is currently implemented as a maximum-likelihood method, but this model may also serve as the basis for the development of Bayesian approaches that incorporate experimental design and error.  相似文献   

4.
Population genetics theory has laid the foundations for genomic analyses including the recent burst in genome scans for selection and statistical inference of past demographic events in many prokaryote, animal and plant species. Identifying SNPs under natural selection and underpinning species adaptation relies on disentangling the respective contribution of random processes (mutation, drift, migration) from that of selection on nucleotide variability. Most theory and statistical tests have been developed using the Kingman coalescent theory based on the Wright‐Fisher population model. However, these theoretical models rely on biological and life history assumptions which may be violated in many prokaryote, fungal, animal or plant species. Recent theoretical developments of the so‐called multiple merger coalescent models are reviewed here (Λ‐coalescent, beta‐coalescent, Bolthausen‐Sznitman, Ξ‐coalescent). We explain how these new models take into account various pervasive ecological and biological characteristics, life history traits or life cycles which were not accounted in previous theories such as (i) the skew in offspring production typical of marine species, (ii) fast adapting microparasites (virus, bacteria and fungi) exhibiting large variation in population sizes during epidemics, (iii) the peculiar life cycles of fungi and bacteria alternating sexual and asexual cycles and (iv) the high rates of extinction‐recolonization in spatially structured populations. We finally discuss the relevance of multiple merger models for the detection of SNPs under selection in these species, for population genomics of very large sample size and advocate to potentially examine the conclusion of previous population genetics studies.  相似文献   

5.
A special stochastic process, called the coalescent, is of fundamental interest in population genetics. For a large class of population models this process is the appropriate tool to analyse the ancestral structure of a sample of n individuals or genes, if the total number of individuals in the population is sufficiently large. A corresponding convergence theorem was first proved by Kingman in 1982 for the Wright-Fisher model and the Moran model. Generalizations to a large class of exchangeable population models and to models with overlying mutation processes followed shortly later. One speaks of the "robustness of the coalescent, as this process appears in many models as the total population size tends to infinity. This publication can be considered as an introduction to the theory of the coalescent as well as a review of the most important "convergence-to-the-coalescent-theorems. Convergence theorems are not only presented for the classical exchangeable haploid case but also for larger classes of population models, for example for diploid, two-sex or non-exchangeable models. A review-like summary of further examples and applications of convergence to the coalescent is given including the most important biological forces like mutation, recombination and selection. The general coalescent process allows for simultaneous multiple mergers of ancestral lines.  相似文献   

6.
We analyze patterns of genetic variability of populations in the presence of a large seedbank with the help of a new coalescent structure called the seedbank coalescent. This ancestral process appears naturally as a scaling limit of the genealogy of large populations that sustain seedbanks, if the seedbank size and individual dormancy times are of the same order as those of the active population. Mutations appear as Poisson processes on the active lineages and potentially at reduced rate also on the dormant lineages. The presence of “dormant” lineages leads to qualitatively altered times to the most recent common ancestor and nonclassical patterns of genetic diversity. To illustrate this we provide a Wright–Fisher model with a seedbank component and mutation, motivated from recent models of microbial dormancy, whose genealogy can be described by the seedbank coalescent. Based on our coalescent model, we derive recursions for the expectation and variance of the time to most recent common ancestor, number of segregating sites, pairwise differences, and singletons. Estimates (obtained by simulations) of the distributions of commonly employed distance statistics, in the presence and absence of a seedbank, are compared. The effect of a seedbank on the expected site-frequency spectrum is also investigated using simulations. Our results indicate that the presence of a large seedbank considerably alters the distribution of some distance statistics, as well as the site-frequency spectrum. Thus, one should be able to detect from genetic data the presence of a large seedbank in natural populations.  相似文献   

7.
A class of two-sex population models is considered with N females and equal number N of males constituting each generation. Reproduction is assumed to undergo three stages: 1) random mating, 2) exchangeable reproduction, 3) random sex assignment. Treating individuals as pairs of genes at a certain locus we introduce the diploid ancestral process (the past genealogical tree) for n such genes sampled in the current generation. Neither mutation nor selection are assumed. A convergence criterium for the diploid ancestral process is proved as N goes to infinity while n remains unchanged. Conditions are specified when the limiting process (coalescent) is the Kingman coalescent and situations are discussed when the coalescent allows for multiple mergers of ancestral lines.Work supported by the Bank of Sweden Tercentenary Foundation.Mathematics Subject Classification (2000):Primary 92F25, 60J70; Secondary 92D15, 60F17  相似文献   

8.
Coalescent theory is commonly used to perform population genetic inference at the nucleotide level. Here, we examine the procedure that fixes the number of segregating sites (henceforth the FS procedure). In this approach a fixed number of segregating sites (S) are placed on a coalescent tree (independently of the total and internode lengths of the tree). Thus, although widely used, the FS procedure does not strictly follow the assumptions of coalescent theory and must be considered an approximation of (i) the standard procedure that uses a fixed population mutation parameter theta, and (ii) procedures that condition on the number of segregating sites. We study the differences in the false positive rate for nine statistics by comparing the FS procedure with the procedures (i) and (ii), using several evolutionary models with single-locus and multilocus data. Our results indicate that for single-locus data the FS procedure is accurate for the equilibrium neutral model, but problems arise under the alternative models studied; furthermore, for multilocus data, the FS procedure becomes inaccurate even for the standard neutral model. Therefore, we recommend a procedure that fixes the theta value (or alternatively, procedures that condition on S and take into account the uncertainty of theta) for analysing evolutionary models with multilocus data. With single-locus data, the FS procedure should not be employed for models other than the standard neutral model.  相似文献   

9.
Matsen FA  Wakeley J 《Genetics》2006,172(1):701-708
In this article we apply some graph-theoretic results to the study of coalescence in a structured population with migration. The graph is the pattern of migration among subpopulations, or demes, and we use the theory of random walks on graphs to characterize the ease with which ancestral lineages can traverse the habitat in a series of migration events. We identify conditions under which the coalescent process in populations with restricted migration, such that individuals cannot traverse the habitat freely in a single migration event, nonetheless becomes identical to the coalescent process in the island migration model in the limit as the number of demes tends to infinity. Specifically, we first note that a sequence of symmetric graphs with Diaconis-Stroock constant bounded above has an unstructured Kingman-type coalescent in the limit for a sample of size two from two different demes. We then show that circular and toroidal models with long-range but restricted migration have an upper bound on this constant and so have an unstructured-migration coalescent in the limit. We investigate the rate of convergence to this limit using simulations.  相似文献   

10.
11.
Phylogeographic methods aim to infer migration trends and the history of sampled lineages from genetic data. Applications of phylogeography are broad, and in the context of pathogens include the reconstruction of transmission histories and the origin and emergence of outbreaks. Phylogeographic inference based on bottom-up population genetics models is computationally expensive, and as a result faster alternatives based on the evolution of discrete traits have become popular. In this paper, we show that inference of migration rates and root locations based on discrete trait models is extremely unreliable and sensitive to biased sampling. To address this problem, we introduce BASTA (BAyesian STructured coalescent Approximation), a new approach implemented in BEAST2 that combines the accuracy of methods based on the structured coalescent with the computational efficiency required to handle more than just few populations. We illustrate the potentially severe implications of poor model choice for phylogeographic analyses by investigating the zoonotic transmission of Ebola virus. Whereas the structured coalescent analysis correctly infers that successive human Ebola outbreaks have been seeded by a large unsampled non-human reservoir population, the discrete trait analysis implausibly concludes that undetected human-to-human transmission has allowed the virus to persist over the past four decades. As genomics takes on an increasingly prominent role informing the control and prevention of infectious diseases, it will be vital that phylogeographic inference provides robust insights into transmission history.  相似文献   

12.
Quantifying epidemiological dynamics is crucial for understanding and forecasting the spread of an epidemic. The coalescent and the birth-death model are used interchangeably to infer epidemiological parameters from the genealogical relationships of the pathogen population under study, which in turn are inferred from the pathogen genetic sequencing data. To compare the performance of these widely applied models, we performed a simulation study. We simulated phylogenetic trees under the constant rate birth-death model and the coalescent model with a deterministic exponentially growing infected population. For each tree, we re-estimated the epidemiological parameters using both a birth-death and a coalescent based method, implemented as an MCMC procedure in BEAST v2.0. In our analyses that estimate the growth rate of an epidemic based on simulated birth-death trees, the point estimates such as the maximum a posteriori/maximum likelihood estimates are not very different. However, the estimates of uncertainty are very different. The birth-death model had a higher coverage than the coalescent model, i.e. contained the true value in the highest posterior density (HPD) interval more often (2–13% vs. 31–75% error). The coverage of the coalescent decreases with decreasing basic reproductive ratio and increasing sampling probability of infecteds. We hypothesize that the biases in the coalescent are due to the assumption of deterministic rather than stochastic population size changes. Both methods performed reasonably well when analyzing trees simulated under the coalescent. The methods can also identify other key epidemiological parameters as long as one of the parameters is fixed to its true value. In summary, when using genetic data to estimate epidemic dynamics, our results suggest that the birth-death method will be less sensitive to population fluctuations of early outbreaks than the coalescent method that assumes a deterministic exponentially growing infected population.  相似文献   

13.
For many species, climate oscillations drove cycles of population contraction during cool glacial periods followed by expansion during interglacials. Some groups, however, show evidence of uniform and synchronous expansion, while others display differences in the timing and extent of demographic change. We compared demographic histories inferred from genetic data across marine turtle species to identify responses to postglacial warming shared across taxa and to examine drivers of past demographic change at the global scale. Using coalescent simulations and approximate Bayesian computation (ABC), we estimated demographic parameters, including the likelihood of past population expansion, from a mitochondrial data set encompassing 23 previously identified lineages from all seven marine turtle species. For lineages with a high posterior probability of expansion, we conducted a hierarchical ABC analysis to estimate the proportion of lineages expanding synchronously and the timing of synchronous expansion. We used Bayesian model averaging to identify variables associated with expansion and genetic diversity. Approximately 60% of extant marine turtle lineages showed evidence of expansion, with the rest mainly exhibiting patterns of genetic diversity most consistent with population stability. For lineages showing expansion, there was a strong signal of synchronous expansion after the Last Glacial Maximum. Expansion and genetic diversity were best explained by ocean basin and the degree of endemism for a given lineage. Geographic differences in sensitivity to climate change have implications for prioritizing conservation actions in marine turtles as well as for identifying areas of past demographic stability and potential resilience to future climate change for broadly distributed taxa.  相似文献   

14.
Current human sequencing projects observe an abundance of extremely rare genetic variation, suggesting recent acceleration of population growth. To better understand the impact of such accelerating growth on the quantity and nature of genetic variation, we present a new class of models capable of incorporating faster than exponential growth in a coalescent framework. Our work shows that such accelerated growth affects only the population size in the recent past and thus large samples are required to detect the models’ effects on patterns of variation. When we compare models with fixed initial growth rate, models with accelerating growth achieve very large current population sizes and large samples from these populations contain more variation than samples from populations with constant growth. This increase is driven almost entirely by an increase in singleton variation. Moreover, linkage disequilibrium decays faster in populations with accelerating growth. When we instead condition on current population size, models with accelerating growth result in less overall variation and slower linkage disequilibrium decay compared to models with exponential growth. We also find that pairwise linkage disequilibrium of very rare variants contains information about growth rates in the recent past. Finally, we demonstrate that models of accelerating growth may substantially change estimates of present-day effective population sizes and growth times.  相似文献   

15.
Many diploid organisms undergo facultative sexual reproduction. However, little is currently known concerning the distribution of neutral genetic variation among facultative sexual organisms except in very simple cases. Understanding this distribution is important when making inferences about rates of sexual reproduction, effective population size, and demographic history. Here we extend coalescent theory in diploids with facultative sex to consider gene conversion, selfing, population subdivision, and temporal and spatial heterogeneity in rates of sex. In addition to analytical results for two-sample coalescent times, we outline a coalescent algorithm that accommodates the complexities arising from partial sex; this algorithm can be used to generate multisample coalescent distributions. A key result is that when sex is rare, gene conversion becomes a significant force in reducing diversity within individuals. This can reduce genomic signatures of infrequent sex (i.e., elevated within-individual allelic sequence divergence) or entirely reverse the predicted patterns. These models offer improved methods for assessing null patterns of molecular variation in facultative sexual organisms.  相似文献   

16.
Coalescent theory is routinely used to estimate past population dynamics and demographic parameters from genealogies. While early work in coalescent theory only considered simple demographic models, advances in theory have allowed for increasingly complex demographic scenarios to be considered. The success of this approach has lead to coalescent-based inference methods being applied to populations with rapidly changing population dynamics, including pathogens like RNA viruses. However, fitting epidemiological models to genealogies via coalescent models remains a challenging task, because pathogen populations often exhibit complex, nonlinear dynamics and are structured by multiple factors. Moreover, it often becomes necessary to consider stochastic variation in population dynamics when fitting such complex models to real data. Using recently developed structured coalescent models that accommodate complex population dynamics and population structure, we develop a statistical framework for fitting stochastic epidemiological models to genealogies. By combining particle filtering methods with Bayesian Markov chain Monte Carlo methods, we are able to fit a wide class of stochastic, nonlinear epidemiological models with different forms of population structure to genealogies. We demonstrate our framework using two structured epidemiological models: a model with disease progression between multiple stages of infection and a two-population model reflecting spatial structure. We apply the multi-stage model to HIV genealogies and show that the proposed method can be used to estimate the stage-specific transmission rates and prevalence of HIV. Finally, using the two-population model we explore how much information about population structure is contained in genealogies and what sample sizes are necessary to reliably infer parameters like migration rates.  相似文献   

17.
Throughout history, the population size of modern humans has varied considerably due to changes in environment, culture, and technology. More accurate estimates of population size changes, and when they occurred, should provide a clearer picture of human colonization history and help remove confounding effects from natural selection inference. Demography influences the pattern of genetic variation in a population, and thus genomic data of multiple individuals sampled from one or more present-day populations contain valuable information about the past demographic history. Recently, Li and Durbin developed a coalescent-based hidden Markov model, called the pairwise sequentially Markovian coalescent (PSMC), for a pair of chromosomes (or one diploid individual) to estimate past population sizes. This is an efficient, useful approach, but its accuracy in the very recent past is hampered by the fact that, because of the small sample size, only few coalescence events occur in that period. Multiple genomes from the same population contain more information about the recent past, but are also more computationally challenging to study jointly in a coalescent framework. Here, we present a new coalescent-based method that can efficiently infer population size changes from multiple genomes, providing access to a new store of information about the recent past. Our work generalizes the recently developed sequentially Markov conditional sampling distribution framework, which provides an accurate approximation of the probability of observing a newly sampled haplotype given a set of previously sampled haplotypes. Simulation results demonstrate that we can accurately reconstruct the true population histories, with a significant improvement over the PSMC in the recent past. We apply our method, called diCal, to the genomes of multiple human individuals of European and African ancestry to obtain a detailed population size change history during recent times.  相似文献   

18.
The allele frequency spectrum is a series of statistics that describe genetic polymorphism, and is commonly used for inferring population genetic parameters and detecting natural selection. Population genetic theory on the allele frequency spectrum for a single population has been well studied using both coalescent theory and diffusion equations. Recently, the theory was extended to the joint allele frequency spectrum (JAFS) for three populations using diffusion equations and was shown to be very useful in inferring human demographic history. In this paper, I show that the JAFS can be analytically derived with coalescent theory for a basic model of two isolated populations and then extended to multiple populations and various complex scenarios, such as those involving population growth and bottleneck, migration, and positive selection. Simulation study is used to demonstrate the accuracy and applicability of the theoretical model. The coalescent theory-based approach for the JAFS can characterize the demographic history with comprehensive statistical models as the diffusion approach does, and in addition gains several novel advantages: the computational complexity of calculating the JAFS with coalescent theory is reduced, and thus it is feasible to analytically obtain the JAFS for multiple populations; the hitchhiking effect can be efficiently modeled in coalescent theory, enabling the development of methodologies for detecting selection via multi-population polymorphism data. As an alternative to the diffusion approximation approach, the coalescent theory for the JAFS also provides a foundation for population genetic inference with the advent of large-scale genomic polymorphism data.  相似文献   

19.
Pleistocene climatic oscillations were a major force shaping genetic variability in many taxa. We analyse the relative effects of the ice ages across a latitudinal gradient in the Western Mediterranean region, testing two main predictions: (i) species with historical distributions in northern latitudes should have experienced greater loss of suitable habitat, resulting in higher extinction of historical lineages than species distributed in southern latitudes, where the effects of the ice ages were not as drastic. This would be reflected in the observation of lower diversity and number of differentiated lineages in northern areas. (ii) a signature of demographic expansion following the climate amelioration should be obvious in northern species, whereas in the south evidence of long-term effective population size stability should be observed. We used as models three species of wall lizards (Podarcis bocagei, Podarcis carbonelli and Podarcis vaucheri) that replace each other along the study area. We investigated the patterns of mitochondrial DNA diversity and subdivision and obtained demographic parameter estimates for each species. Our results suggest that P. bocagei, the northernmost species, bears low genetic diversity, a shallow coalescent history and marks of a demographic expansion. In contrast, P. vaucheri, the species with a southernmost distribution, shows deeper coalescence events, complex geographical substructure and no evidence for population growth. The species with an intermediate distribution, P. carbonelli, shows average levels of diversity, substructure and population growth. Taken together, these results conform to our main predictions and are explained by a differential influence of the ice ages on distinct latitudes.  相似文献   

20.
Since the early Holocene, fish population genetics in the Laurentian Great Lakes have been shaped by the dual influences of habitat structure and post‐glacial dispersal. Riverscape genetics theory predicts that longitudinal habitat corridors and unidirectional downstream water‐flow drive the downstream accumulation of genetic diversity, whereas post‐glacial dispersal theory predicts that fish genetic diversity should decrease with increasing distance from glacial refugia. This study examines populations of seven native fish species codistributed above and below the 58 m high Niagara Falls – a hypothesized barrier to gene flow in aquatic species. A better understanding of Niagara Falls’ role as a barrier to gene flow and dispersal is needed to identify drivers of Great Lakes genetic diversity and guide strategies to limit exotic species invasions. We used genome‐wide SNPs and coalescent models to test whether populations are: (a) genetically distinct, consistent with the Niagara Falls barrier hypothesis; (b) more genetically diverse upstream, consistent with post‐glacial expansion theory, or downstream, consistent with the riverscape habitat theory; and (c) have migrated either upstream or downstream past Niagara Falls. We found that genetic diversity is consistently greater below Niagara Falls and the falls are an effective barrier to migration, but two species have probably dispersed upstream past the falls after glacial retreat yet before opening of the Welland Canal. Models restricting migration to after opening of the Welland Canal were generally rejected. These results help explain how river habitat features affect aquatic species’ genetic diversity and highlight the need to better understand post‐glacial dispersal pathways.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号