首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 562 毫秒
1.
The Kingman coalescent, which has become the foundation for a wide range of theoretical as well as empirical studies, was derived as an approximation of the Wright-Fisher (WF) model. The approximation heavily relies on the assumption that population size is large and sample size is much smaller than the population size. Whether the sample size is too large compared to the population size is rarely questioned in practice when applying statistical methods based on the Kingman coalescent. Since WF model is the most widely used population genetics model for reproduction, it is desirable to develop a coalescent framework for the WF model, which can be used whenever there are concerns about the accuracy of the Kingman coalescent as an approximation. This paper described the exact coalescent theory for the WF model and develops a simulation algorithm, which is then used, together with an analytical approach, to study the properties of the exact coalescent as well as its differences to the Kingman coalescent. We show that the Kingman coalescent differs from the exact coalescent by: (1) shorter waiting time between successive coalescent events; (2) different probability of observing a topological relationship among sequences in a sample; and (3) slightly smaller tree length in the genealogy of a large sample. On the other hand, there is little difference in the age of the most recent common ancestor (MRCA) of the sample. The exact coalescent makes up the longer waiting time between successive coalescent events by having multiple coalescence at the same time. The most significant difference among various summary statistics of a coalescent examined is the sum of lengths of external branches, which can be more than 10% larger for exact coalescent than that for the Kingman coalescent. As a whole, the Kingman coalescent is a remarkably accurate approximation to the exact coalescent for sample and population sizes falling considerably outside the region that was originally anticipated.  相似文献   

2.
The study of sequence diversity under phylogenetic models is now classic. Theoretical studies of diversity under the Kingman coalescent appeared shortly after the introduction of the coalescent. In this paper we revisit this topic under the multispecies coalescent, an extension of the single population model to multiple populations. We derive exact formulas for the sequence dissimilarity of two sequences drawn at random under a basic multispecies setup. The multispecies model uses three parameters—the species tree birth rate under the pure birth process (Yule), the species effective population size and the mutation rate. We also discuss the effects of relaxing some of the model assumptions.  相似文献   

3.
Among elephants, the phylogeographic patterns of mitochondrial (mt) and nuclear markers are often incongruent. One hypothesis attributes this to sex differences in dispersal and in the variance of reproductive success. We tested this hypothesis by examining the coalescent dates of genetic markers within elephantid lineages, predicting that lower dispersal and lower variance in reproductive success among females would have increased mtDNA relative to nuclear coalescent dates. We sequenced the mitochondrial genomes of two forest elephants, aligning them to mitogenomes of African savanna and Asian elephants, and of woolly mammoths, including the most divergent mitogenomes within each lineage. Using fossil calibrations, the divergence between African elephant F and S clade mitochondrial genomes (originating in forest and savanna elephant lineages, respectively) was estimated as 5.5 Ma. We estimated that the (African) ancestor of the mammoth and Asian elephant lineages diverged 6.0 Ma, indicating that four elephantid lineages had differentiated in Africa by the Miocene–Pliocene transition, concurrent with drier climates. The coalescent date for forest elephant mtDNAs was c. 2.4 Ma, suggesting that the decrease in tropical forest cover during the Pleistocene isolated distinct African forest elephant lineages. For all elephantid lineages, the ratio of mtDNA to nuclear coalescent dates was much greater than 0.25. This is consistent with the expectation that sex differences in dispersal and in variance of reproductive success would have increased the effective population size of mtDNA relative to nuclear markers in elephantids, contributing to the persistence of incongruent mtDNA phylogeographic patterns.  相似文献   

4.
A special stochastic process, called the coalescent, is of fundamental interest in population genetics. For a large class of population models this process is the appropriate tool to analyse the ancestral structure of a sample of n individuals or genes, if the total number of individuals in the population is sufficiently large. A corresponding convergence theorem was first proved by Kingman in 1982 for the Wright-Fisher model and the Moran model. Generalizations to a large class of exchangeable population models and to models with overlying mutation processes followed shortly later. One speaks of the "robustness of the coalescent, as this process appears in many models as the total population size tends to infinity. This publication can be considered as an introduction to the theory of the coalescent as well as a review of the most important "convergence-to-the-coalescent-theorems. Convergence theorems are not only presented for the classical exchangeable haploid case but also for larger classes of population models, for example for diploid, two-sex or non-exchangeable models. A review-like summary of further examples and applications of convergence to the coalescent is given including the most important biological forces like mutation, recombination and selection. The general coalescent process allows for simultaneous multiple mergers of ancestral lines.  相似文献   

5.
The genealogical structure of neutral populations in which reproductive success is highly-skewed has been the subject of many recent studies. Here we derive a coalescent dual process for a related class of continuous-time Moran models with viability selection. In these models, individuals can give birth to multiple offspring whose survival depends on both the parental genotype and the brood size. This extends the dual process construction for a multi-type Moran model with genic selection described in Etheridge and Griffiths (2009). We show that in the limit of infinite population size the non-neutral Moran models converge to a Markov jump process which we call the Λ-Fleming-Viot process with viability selection and we derive a coalescent dual for this process directly from the generator and as a limit from the Moran models. The dual is a branching-coalescing process similar to the Ancestral Selection Graph which follows the typed ancestry of genes backwards in time with real and virtual lineages. As an application, the transition functions of the non-neutral Moran and Λ-coalescent models are expressed as mixtures of the transition functions of the dual process.  相似文献   

6.
Phylogeographic methods aim to infer migration trends and the history of sampled lineages from genetic data. Applications of phylogeography are broad, and in the context of pathogens include the reconstruction of transmission histories and the origin and emergence of outbreaks. Phylogeographic inference based on bottom-up population genetics models is computationally expensive, and as a result faster alternatives based on the evolution of discrete traits have become popular. In this paper, we show that inference of migration rates and root locations based on discrete trait models is extremely unreliable and sensitive to biased sampling. To address this problem, we introduce BASTA (BAyesian STructured coalescent Approximation), a new approach implemented in BEAST2 that combines the accuracy of methods based on the structured coalescent with the computational efficiency required to handle more than just few populations. We illustrate the potentially severe implications of poor model choice for phylogeographic analyses by investigating the zoonotic transmission of Ebola virus. Whereas the structured coalescent analysis correctly infers that successive human Ebola outbreaks have been seeded by a large unsampled non-human reservoir population, the discrete trait analysis implausibly concludes that undetected human-to-human transmission has allowed the virus to persist over the past four decades. As genomics takes on an increasingly prominent role informing the control and prevention of infectious diseases, it will be vital that phylogeographic inference provides robust insights into transmission history.  相似文献   

7.
Cytotoxic T lymphocytes (CTLs) are immune system cells that are thought to play an important role in controlling HIV infection. We develop a stochastic ODE model of HIV-CTL interaction that extends current deterministic ODE models. Based on this stochastic model, we consider the effect of CTL attack on intrahost HIV lineages assuming that CTLs attack several epitopes with equal strength. In this setting, we introduce a limiting version of our stochastic ODE under which we show that the coalescence of HIV lineages can be described through Poisson-Dirichlet distributions. Through numerical experiments, we show that our results under the limiting stochastic ODE accurately reflect HIV lineages under CTL attack when the HIV population size is on the low end of its hypothesized range. Current techniques of HIV lineage construction depend on the Kingman coalescent. Our results give an explicit connection between CTL attack and HIV lineages.  相似文献   

8.
Estimation of epidemiological and population parameters from molecular sequence data has become central to the understanding of infectious disease dynamics. Various models have been proposed to infer details of the dynamics that describe epidemic progression. These include inference approaches derived from Kingman’s coalescent theory. Here, we use recently described coalescent theory for epidemic dynamics to develop stochastic and deterministic coalescent susceptible–infected–removed (SIR) tree priors. We implement these in a Bayesian phylogenetic inference framework to permit joint estimation of SIR epidemic parameters and the sample genealogy. We assess the performance of the two coalescent models and also juxtapose results obtained with a recently published birth–death-sampling model for epidemic inference. Comparisons are made by analyzing sets of genealogies simulated under precisely known epidemiological parameters. Additionally, we analyze influenza A (H1N1) sequence data sampled in the Canterbury region of New Zealand and HIV-1 sequence data obtained from known United Kingdom infection clusters. We show that both coalescent SIR models are effective at estimating epidemiological parameters from data with large fundamental reproductive number R0 and large population size S0. Furthermore, we find that the stochastic variant generally outperforms its deterministic counterpart in terms of error, bias, and highest posterior density coverage, particularly for smaller R0 and S0. However, each of these inference models is shown to have undesirable properties in certain circumstances, especially for epidemic outbreaks with R0 close to one or with small effective susceptible populations.  相似文献   

9.
We examine genetic statistics used in the study of structured populations. In a 1999 paper, Wakeley observed that the coalescent process associated with the finite island model can be decomposed into a scattering phase and a collecting phase. This decomposition becomes exact in the large population limit with the coalescent at the end of the scattering phase converging to the Ewens sampling formula and the coalescent during the collecting phase converging to the Kingman coalescent. In this paper we introduce a class of limiting models, which we refer to as G/KC models, that generalize Wakeley’s decomposition. G in G/KC represents a completely general limit for the scattering phase, while KC represents a Kingman coalescent limit for the collecting phase. We show that both the island and two-dimensional stepping stone models converge to G/KC models in the large population limit. We then derive the distribution of the statistic F st for all G/KC models under a large sample limit for the cases of strong or weak mutation, thereby deriving the large population, large sample limiting distribution of F st for the island and two-dimensional stepping stone models as a special case of a general formula. Our methods allow us to take the large population and large sample limits simultaneously. In the context of large population, large sample limits, we show that the variance of F st in the presence of weak mutation collapses as O(\frac1logd){O(\frac{1}{\log d})} where d is the number of demes sampled. Further, we show that this O(\frac1logd){O(\frac{1}{\log d})} is caused by a heavy tail in the distribution of F st . Our analysis of F st can be extended to an entire class of genetic statistics, and we use our approach to examine homozygosity measures. Our analysis uses coalescent based methods.  相似文献   

10.
Population genetics theory has laid the foundations for genomic analyses including the recent burst in genome scans for selection and statistical inference of past demographic events in many prokaryote, animal and plant species. Identifying SNPs under natural selection and underpinning species adaptation relies on disentangling the respective contribution of random processes (mutation, drift, migration) from that of selection on nucleotide variability. Most theory and statistical tests have been developed using the Kingman coalescent theory based on the Wright‐Fisher population model. However, these theoretical models rely on biological and life history assumptions which may be violated in many prokaryote, fungal, animal or plant species. Recent theoretical developments of the so‐called multiple merger coalescent models are reviewed here (Λ‐coalescent, beta‐coalescent, Bolthausen‐Sznitman, Ξ‐coalescent). We explain how these new models take into account various pervasive ecological and biological characteristics, life history traits or life cycles which were not accounted in previous theories such as (i) the skew in offspring production typical of marine species, (ii) fast adapting microparasites (virus, bacteria and fungi) exhibiting large variation in population sizes during epidemics, (iii) the peculiar life cycles of fungi and bacteria alternating sexual and asexual cycles and (iv) the high rates of extinction‐recolonization in spatially structured populations. We finally discuss the relevance of multiple merger models for the detection of SNPs under selection in these species, for population genomics of very large sample size and advocate to potentially examine the conclusion of previous population genetics studies.  相似文献   

11.
Coalescent theory is routinely used to estimate past population dynamics and demographic parameters from genealogies. While early work in coalescent theory only considered simple demographic models, advances in theory have allowed for increasingly complex demographic scenarios to be considered. The success of this approach has lead to coalescent-based inference methods being applied to populations with rapidly changing population dynamics, including pathogens like RNA viruses. However, fitting epidemiological models to genealogies via coalescent models remains a challenging task, because pathogen populations often exhibit complex, nonlinear dynamics and are structured by multiple factors. Moreover, it often becomes necessary to consider stochastic variation in population dynamics when fitting such complex models to real data. Using recently developed structured coalescent models that accommodate complex population dynamics and population structure, we develop a statistical framework for fitting stochastic epidemiological models to genealogies. By combining particle filtering methods with Bayesian Markov chain Monte Carlo methods, we are able to fit a wide class of stochastic, nonlinear epidemiological models with different forms of population structure to genealogies. We demonstrate our framework using two structured epidemiological models: a model with disease progression between multiple stages of infection and a two-population model reflecting spatial structure. We apply the multi-stage model to HIV genealogies and show that the proposed method can be used to estimate the stage-specific transmission rates and prevalence of HIV. Finally, using the two-population model we explore how much information about population structure is contained in genealogies and what sample sizes are necessary to reliably infer parameters like migration rates.  相似文献   

12.
Quantifying epidemiological dynamics is crucial for understanding and forecasting the spread of an epidemic. The coalescent and the birth-death model are used interchangeably to infer epidemiological parameters from the genealogical relationships of the pathogen population under study, which in turn are inferred from the pathogen genetic sequencing data. To compare the performance of these widely applied models, we performed a simulation study. We simulated phylogenetic trees under the constant rate birth-death model and the coalescent model with a deterministic exponentially growing infected population. For each tree, we re-estimated the epidemiological parameters using both a birth-death and a coalescent based method, implemented as an MCMC procedure in BEAST v2.0. In our analyses that estimate the growth rate of an epidemic based on simulated birth-death trees, the point estimates such as the maximum a posteriori/maximum likelihood estimates are not very different. However, the estimates of uncertainty are very different. The birth-death model had a higher coverage than the coalescent model, i.e. contained the true value in the highest posterior density (HPD) interval more often (2–13% vs. 31–75% error). The coverage of the coalescent decreases with decreasing basic reproductive ratio and increasing sampling probability of infecteds. We hypothesize that the biases in the coalescent are due to the assumption of deterministic rather than stochastic population size changes. Both methods performed reasonably well when analyzing trees simulated under the coalescent. The methods can also identify other key epidemiological parameters as long as one of the parameters is fixed to its true value. In summary, when using genetic data to estimate epidemic dynamics, our results suggest that the birth-death method will be less sensitive to population fluctuations of early outbreaks than the coalescent method that assumes a deterministic exponentially growing infected population.  相似文献   

13.
The structured coalescent allows inferring migration patterns between viral subpopulations from genetic sequence data. However, these analyses typically assume that no genetic recombination process impacted the sequence evolution of pathogens. For segmented viruses, such as influenza, that can undergo reassortment this assumption is broken. Reassortment reshuffles the segments of different parent lineages upon a coinfection event, which means that the shared history of viruses has to be represented by a network instead of a tree. Therefore, full genome analyses of such viruses are complex or even impossible. Although this problem has been addressed for unstructured populations, it is still impossible to account for population structure, such as induced by different host populations, whereas also accounting for reassortment. We address this by extending the structured coalescent to account for reassortment and present a framework for investigating possible ties between reassortment and migration (host jump) events. This method can accurately estimate subpopulation dependent effective populations sizes, reassortment, and migration rates from simulated data. Additionally, we apply the new model to avian influenza A/H5N1 sequences, sampled from two avian host types, Anseriformes and Galliformes. We contrast our results with a structured coalescent without reassortment inference, which assumes independently evolving segments. This reveals that taking into account segment reassortment and using sequencing data from several viral segments for joint phylodynamic inference leads to different estimates for effective population sizes, migration, and clock rates. This new model is implemented as the Structured Coalescent with Reassortment package for BEAST 2.5 and is available at https://github.com/jugne/SCORE.  相似文献   

14.
Estimates of speciation times are subject to a number of potential errors. One source of bias is that effective population size (Ne) has been shown to influence substitution rates. This issue is of particular interest for phylogeographic studies because population sizes can vary dramatically among genetically structured populations across species’ ranges. In this study, we used multilocus data to examine temporal phylogeographic patterns in a widespread North American songbird, the Northern Cardinal (Cardinalis cardinalis). Species tree estimation indicated that the phylogeographic structure of C. cardinalis was comprised of four well-supported mainland lineages with large population sizes (large Ne) and two island lineages comprised of much smaller populations (small Ne). We inferred speciation times from mtDNA and multilocus data and found there was discordance between events that represented island-mainland divergences, whereas both estimates were similar for divergences among mainland lineages. We performed coalescent simulations and found that the difference in speciation times could be attributed to stochasticity for a recently diverged island lineage. However, the magnitude of the change between speciation times estimated from mtDNA and multilocus data of an older island lineage was substantially greater than predicted by coalescent simulations. For this divergence, we found the discordance in time estimates was due to a substantial increase in the mtDNA substitution rate in the small island population. These findings indicate that in phylogeographic studies the relative tempo of evolution between mtDNA and nuclear DNA can become highly discordant in small populations.  相似文献   

15.
With increasing force, genetic divergence of mitochondrial DNA (mtDNA) is being argued as the primary tool for discovery of animal species. Two thresholds of single-gene divergence have been proposed: reciprocal monophyly, and 10 times greater genetic divergence between than within species (the "10x rule"). To explore quantitatively the utility of each approach, we couple neutral coalescent theory and the classical Bateson-Dobzhansky-Muller (BDM) model of speciation. The joint stochastic dynamics of these two processes demonstrate that both thresholds fail to "discover" many reproductively isolated lineages under a single incompatibility BDM model, especially when BDM loci have been subject to divergent selection. Only when populations have been isolated for > 4 million generations did these thresholds achieve error rates of < 10% under our model that incorporates variable population sizes. The high error rate evident in simulations is corroborated with six empirical data sets. These properties suggest that single-gene, high-throughput approaches to discovering new animal species will bias large-scale biodiversity surveys, particularly toward missing reproductively isolated lineages that have emerged by divergent selection or other mechanisms that accelerate reproductive isolation. Because single-gene thresholds for species discovery can result in substantial error at recent divergence times, they will misrepresent the correspondence between recently isolated populations and reproductively isolated lineages (= species).  相似文献   

16.
Conventional coalescent inferences of population history make the critical assumption that the population under examination is panmictic. However, most populations are structured. This complicates the prevailing coalescent analyses and sometimes leads to inaccurate estimates. To develop a coalescent method unhampered by population structure, we perform two analyses. First, we demonstrate that the coalescent probability of two randomly sampled alleles from the immediate preceding generation(one generation back)is independent of population structure. Second, motivated by this finding, we propose a new coalescent method: i-coalescent analysis. The i-coalescent analysis computes the instantaneous coalescent rate by using a phylogenetic tree of sampled alleles. Using simulated data, we broadly demonstrate the capability of i-coalescent analysis to accurately reconstruct population size dynamics of highly structured populations, although we find this method often requires larger sample sizes for structured populations than for panmictic populations. Overall, our results indicate i-coalescent analysis to be a useful tool, especially for the inference of population histories with intractable structure such as the developmental history of cell populations in the organs of complex organisms.  相似文献   

17.
Probabilities of monophyly, paraphyly, and polyphyly of two-species gene genealogies are computed for modest sample sizes and compared for two different Λ coalescent processes. Coalescent processes belonging to the Λ coalescent family admit asynchronous multiple mergers of active ancestral lineages. Assigning a timescale to the time of divergence becomes a central issue when different populations have different coalescent processes running on different timescales. Clade probabilities in single populations are also computed, which can be useful for testing for taxonomic distinctiveness of an observed set of monophyletic lineages. The coalescence rates of multiple merger coalescent processes are functions of coalescent parameters. The effect of coalescent parameters on the probabilities studied depends on the coalescent process, and if the population is ancestral or derived. The probability of reciprocal monophyly tends to be somewhat lower, when associated with a Λ coalescent, under the null hypothesis that two groups come from the same population. However, even for fairly recent divergence times, the probability of monophyly tends to be higher as a function of the number of generations for coalescent processes that admit multiple mergers, and is sensitive to the parameter of one of the example processes.  相似文献   

18.
Aceria tosichella (the wheat curl mite, WCM) is a global pest of wheat and other cereals, causing losses by direct damage, as well as the transmission of plant viruses. The mite is considered to have an unusually wide host range for an eriophyoid species. The present study tested the commonly held assumption that WCM is a single, highly polyphagous species by assessing the host range of genetically distinct lineages of WCM occurring in Poland on different host plants. Genotyping was performed by analyzing nucleotide sequence data from fragments of the mitochondrial cytochrome c oxidase subunit I (COI) and the nuclear D2 region of 28S rDNA. Mean between‐lineage distance estimated using COI data was found to be one order of magnitude greater than the within‐clade lineage and, in some cases, comparable to distances between WCM lineages and a congeneric outgroup species. Host acceptance was tested by quantifying population growth for different WCM mitochondrial (mt)DNA lineages when transferred from source host plants to test plants. These experiments revealed significant differences in host colonization ability between mtDNA lineages, ranging from highly polyphagous to more host‐specific. The present study reveals that WCM is composed of several discrete genetic lineages with divergent host‐acceptance and specificity traits. Genetic variation for host acceptance within A. tosichella s.l. may act as a reproductive barrier between these lineages, most of which had narrow host ranges. Two lineages appear to have high pest potential on cereals, whereas several others appear to specialize on wild grass species. We conclude that WCM is not a homogeneous species comprising polyphagous panmictic populations rather it is a complex of genetically distinct lineages with variable host ranges and therefore variable pest potential. © 2013 The Linnean Society of London, Biological Journal of the Linnean Society, 2013, 109 , 165–180.  相似文献   

19.
Correa  J.A.  Faugeron  S.  Martínez  E.  Nimptsch  J.  &Paredes  A. 《Journal of phycology》2000,36(S3):15-16
The understanding of infectious diseases of algae has improved significantly in recent years, particularly in the area of recognition and signaling, key processes that determine the success or failure of host invasion by the pathogen. Ecological studies have also contributed to better understanding the role of diseases in wild stands of the affected hosts. An aspect that has received only limited attention is the effect of the infections on host fitness, and in this context, we report a first attempt to quantifying the effects of Pleurocapsa sp. (Cyanophyta) on the reproductive potential of its host Mazzaella laminarioides (Rhodophyta). Infections by Pleurocapsa trigger the development of tumors that can result in major changes in frond morphology and texture. Two populations of the host were considered in the study. Our results indicate that infections do not cause a significant effect on the density or quality of the reproductive structures (i.e. cystocarps and tetrasporangia). However, the number of spores, settlement rates, germination success and offspring survival, were all affected negatively by the endophytic infections. The reported information and field-collected data, strengthen the notion that pathogens of algae may exert strong effects on their hosts at several levels, including reproduction. These effects can vary from host death during infections by highly pathogenic organisms to more subtle effects like those observed in the studied pathosystem. Infections by less aggressive pathogens, however, still may determine important effects at the population level by inducing differential mortality and reproductive success in infected individuals.  相似文献   

20.
During an infection, HIV experiences strong selection by immune system T cells. Recent experimental work has shown that MHC escape mutations form an important pathway for HIV to avoid such selection. In this paper, we study a model of MHC escape mutation. The model is a predator–prey model with two prey, composed of two HIV variants, and one predator, the immune system CD8 cells. We assume that one HIV variant is visible to CD8 cells and one is not. The model takes the form of a system of stochastic differential equations. Motivated by well-known results concerning the short life-cycle of HIV intrahost, we assume that HIV population dynamics occur on a faster time scale then CD8 population dynamics. This separation of time scales allows us to analyze our model using an asymptotic approach. Using this model we study the impact of an MHC escape mutation on the population dynamics and genetic evolution of the intrahost HIV population. From the perspective of population dynamics, we show that the competition between the visible and invisible HIV variants can reach steady states in which either a single variant exists or in which coexistence occurs depending on the parameter regime. We show that in some parameter regimes the end state of the system is stochastic. From a genetics perspective, we study the impact of the population dynamics on the lineages of an HIV sample taken after an escape mutation occurs. We show that the lineages go through severe bottlenecks and that in certain parameter regimes the lineage distribution can be characterized by a Kingman coalescent. Our results depend on methods from diffusion theory and coalescent theory.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号