首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
The Kingman coalescent, which has become the foundation for a wide range of theoretical as well as empirical studies, was derived as an approximation of the Wright-Fisher (WF) model. The approximation heavily relies on the assumption that population size is large and sample size is much smaller than the population size. Whether the sample size is too large compared to the population size is rarely questioned in practice when applying statistical methods based on the Kingman coalescent. Since WF model is the most widely used population genetics model for reproduction, it is desirable to develop a coalescent framework for the WF model, which can be used whenever there are concerns about the accuracy of the Kingman coalescent as an approximation. This paper described the exact coalescent theory for the WF model and develops a simulation algorithm, which is then used, together with an analytical approach, to study the properties of the exact coalescent as well as its differences to the Kingman coalescent. We show that the Kingman coalescent differs from the exact coalescent by: (1) shorter waiting time between successive coalescent events; (2) different probability of observing a topological relationship among sequences in a sample; and (3) slightly smaller tree length in the genealogy of a large sample. On the other hand, there is little difference in the age of the most recent common ancestor (MRCA) of the sample. The exact coalescent makes up the longer waiting time between successive coalescent events by having multiple coalescence at the same time. The most significant difference among various summary statistics of a coalescent examined is the sum of lengths of external branches, which can be more than 10% larger for exact coalescent than that for the Kingman coalescent. As a whole, the Kingman coalescent is a remarkably accurate approximation to the exact coalescent for sample and population sizes falling considerably outside the region that was originally anticipated.  相似文献   

2.
Phylogeography: retrospect and prospect   总被引:6,自引:1,他引:5  
Phylogeography has grown explosively in the two decades since the word was coined and the discipline was outlined in 1987. Here I summarize the many achievements and novel perspectives that phylogeography has brought to population genetics, phylogenetic biology and biogeography. I also address future directions for the field. From the introduction of mitochondrial DNA assays in the late 1970s, to the key distinction between gene trees and species phylogenies, to the ongoing era of multi-locus coalescent theory, phylogeographic perspectives have consistently challenged conventional genetic and evolutionary paradigms, and they have forged empirical and conceptual bridges between the formerly separate disciplines of population genetics (microevolutionary analysis) and phylogenetic biology (in macroevolution).  相似文献   

3.

Background

Coalescent simulations have proven very useful in many population genetics studies. In order to arrive to meaningful conclusions, it is important that these simulations resemble the process of molecular evolution as much as possible. To date, no single coalescent program is able to simulate codon sequences sampled from populations with recombination, migration and growth.

Results

We introduce a new coalescent program, called Recodon, which is able to simulate samples of coding DNA sequences under complex scenarios in which several evolutionary forces can interact simultaneously (namely, recombination, migration and demography). The basic codon model implemented is an extension to the general time-reversible model of nucleotide substitution with a proportion of invariable sites and among-site rate variation. In addition, the program implements non-reversible processes and mixtures of different codon models.

Conclusion

Recodon is a flexible tool for the simulation of coding DNA sequences under realistic evolutionary models. These simulations can be used to build parameter distributions for testing evolutionary hypotheses using experimental data. Recodon is written in C, can run in parallel, and is freely available from http://darwin.uvigo.es/.  相似文献   

4.
Bayesian phylogenetics with BEAUti and the BEAST 1.7   总被引:7,自引:0,他引:7  
Computational evolutionary biology, statistical phylogenetics and coalescent-based population genetics are becoming increasingly central to the analysis and understanding of molecular sequence data. We present the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7, which implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. This package includes an enhanced graphical user interface program called Bayesian Evolutionary Analysis Utility (BEAUti) that enables access to advanced models for molecular sequence and phenotypic trait evolution that were previously available to developers only. The package also provides new tools for visualizing and summarizing multispecies coalescent and phylogeographic analyses. BEAUti and BEAST 1.7 are open source under the GNU lesser general public license and available at http://beast-mcmc.googlecode.com and http://beast.bio.ed.ac.uk.  相似文献   

5.
Principal components analysis, PCA, is a statistical method commonly used in population genetics to identify structure in the distribution of genetic variation across geographical location and ethnic background. However, while the method is often used to inform about historical demographic processes, little is known about the relationship between fundamental demographic parameters and the projection of samples onto the primary axes. Here I show that for SNP data the projection of samples onto the principal components can be obtained directly from considering the average coalescent times between pairs of haploid genomes. The result provides a framework for interpreting PCA projections in terms of underlying processes, including migration, geographical isolation, and admixture. I also demonstrate a link between PCA and Wright''s fst and show that SNP ascertainment has a largely simple and predictable effect on the projection of samples. Using examples from human genetics, I discuss the application of these results to empirical data and the implications for inference.  相似文献   

6.
Meligkotsidou L  Fearnhead P 《Genetics》2005,171(4):2073-2084
We develop a method for maximum-likelihood estimation of coalescence times in genealogical trees, based on population genetics data. For this purpose, a Viterbi-type algorithm is constructed to maximize the joint likelihood of the coalescence times. Marginal confidence intervals for the coalescence times based on the profile likelihoods are also computed. Our method of finding MLEs and calculating C.I.'s appears to be more accurate than alternative numerical maximization methods, and maximum-likelihood inference appears to be more accurate than other existing model-free approaches to estimating coalescent times. We demonstrate the method on two different data sets: human Y chromosome DNA data and fungus DNA data.  相似文献   

7.
8.
Coalescent likelihood is the probability of observing the given population sequences under the coalescent model. Computation of coalescent likelihood under the infinite sites model is a classic problem in coalescent theory. Existing methods are based on either importance sampling or Markov chain Monte Carlo and are inexact. In this paper, we develop a simple method that can compute the exact coalescent likelihood for many data sets of moderate size, including real biological data whose likelihood was previously thought to be difficult to compute exactly. Our method works for both panmictic and subdivided populations. Simulations demonstrate that the practical range of exact coalescent likelihood computation for panmictic populations is significantly larger than what was previously believed. We investigate the application of our method in estimating mutation rates by maximum likelihood. A main application of the exact method is comparing the accuracy of approximate methods. To demonstrate the usefulness of the exact method, we evaluate the accuracy of program Genetree in computing the likelihood for subdivided populations.  相似文献   

9.
A special stochastic process, called the coalescent, is of fundamental interest in population genetics. For a large class of population models this process is the appropriate tool to analyse the ancestral structure of a sample of n individuals or genes, if the total number of individuals in the population is sufficiently large. A corresponding convergence theorem was first proved by Kingman in 1982 for the Wright-Fisher model and the Moran model. Generalizations to a large class of exchangeable population models and to models with overlying mutation processes followed shortly later. One speaks of the "robustness of the coalescent, as this process appears in many models as the total population size tends to infinity. This publication can be considered as an introduction to the theory of the coalescent as well as a review of the most important "convergence-to-the-coalescent-theorems. Convergence theorems are not only presented for the classical exchangeable haploid case but also for larger classes of population models, for example for diploid, two-sex or non-exchangeable models. A review-like summary of further examples and applications of convergence to the coalescent is given including the most important biological forces like mutation, recombination and selection. The general coalescent process allows for simultaneous multiple mergers of ancestral lines.  相似文献   

10.
Population genetics theory has laid the foundations for genomic analyses including the recent burst in genome scans for selection and statistical inference of past demographic events in many prokaryote, animal and plant species. Identifying SNPs under natural selection and underpinning species adaptation relies on disentangling the respective contribution of random processes (mutation, drift, migration) from that of selection on nucleotide variability. Most theory and statistical tests have been developed using the Kingman coalescent theory based on the Wright‐Fisher population model. However, these theoretical models rely on biological and life history assumptions which may be violated in many prokaryote, fungal, animal or plant species. Recent theoretical developments of the so‐called multiple merger coalescent models are reviewed here (Λ‐coalescent, beta‐coalescent, Bolthausen‐Sznitman, Ξ‐coalescent). We explain how these new models take into account various pervasive ecological and biological characteristics, life history traits or life cycles which were not accounted in previous theories such as (i) the skew in offspring production typical of marine species, (ii) fast adapting microparasites (virus, bacteria and fungi) exhibiting large variation in population sizes during epidemics, (iii) the peculiar life cycles of fungi and bacteria alternating sexual and asexual cycles and (iv) the high rates of extinction‐recolonization in spatially structured populations. We finally discuss the relevance of multiple merger models for the detection of SNPs under selection in these species, for population genomics of very large sample size and advocate to potentially examine the conclusion of previous population genetics studies.  相似文献   

11.
We use forward and coalescent models of population genetics to study chromosome fusions that reduce the recombination between two locally adapted loci. Under a continent–island model, a fusion spreads and reaches a polymorphic equilibrium when it causes recombination between locally adapted alleles to be less than their selective advantage. In contrast, fusions in a two‐deme model always spread; whether it reaches a polymorphic equilibrium or becomes fixed depends on the relative recombination rates of fused homozygotes and heterozygotes. Neutral divergence around fusion polymorphisms is markedly increased, showing peaks at the point of fusion and at the locally adapted loci. Local adaptation could explain the evolution of many of chromosome fusions, which are some of the most common chromosome rearrangements in nature.  相似文献   

12.
Given a species tree and a gene tree, a valid coalescent history is a list of the branches of the species tree on which coalescences in the gene tree take place. I develop a recursion for the number of valid coalescent histories that exist for an arbitrary gene tree/species tree pair, when one gene lineage is studied per species. The result is obtained by defining a concept of m-extended coalescent histories, enumerating and counting these histories, and taking the special case of m = 1. As a sum over valid coalescent histories appears in a formula for the probability that a random gene tree evolving along the branches of a fixed species tree has a specified labeled topology, the enumeration of valid coalescent histories can considerably reduce the effort required for evaluating this formula.  相似文献   

13.
Some of the effects of past climate dynamics on plant and animal diversity make‐up have been relatively well studied, but to less extent in fungi. Pleistocene refugia are thought to harbour high biological diversity (i.e. phylogenetic lineages and genetic diversity), mainly as a product of increased reproductive isolation and allele conservation. In addition, high extinction rates and genetic erosion are expected in previously glaciated regions. Some of the consequences of past climate dynamics might involve changes in range and population size that can result in divergence and incipient or cryptic speciation. Many of these dynamic processes and patterns can be inferred through phylogenetic and coalescent methods. In this study, we first delimit species within a group of closely related edible ectomycorrhizal Amanita from North America (the American Caesar's mushrooms species complex) using multilocus coalescent‐based approaches; and then address questions related to effects of Pleistocene climate change on the diversity and genetics of the group. Our study includes extensive geographical sampling throughout the distribution range, and DNA sequences from three nuclear protein‐coding genes. Results reveal cryptic diversity and high speciation rates in refugia. Population sizes and expansions seem to be larger at midrange latitudes (Mexican highlands and SE USA). Range shifts are proportional to population size expansions, which were overall more common during the Pleistocene. This study documents responses to past climate change in fungi and also highlights the applicability of the multispecies coalescent in comparative phylogeographical analyses and diversity assessments that include ancestral species.  相似文献   

14.
We analyse sequential Markov coalescent algorithms for populations with demographic structure: for a bottleneck model, a population-divergence model, and for a two-island model with migration. The sequential Markov coalescent method is an approximation to the coalescent suggested by McVean and Cardin, and by Marjoram and Wall. Within this algorithm we compute, for two individuals randomly sampled from the population, the correlation between times to the most recent common ancestor and the linkage probability corresponding to two different loci with recombination rate R between them. These quantities characterise the linkage between the two loci in question. We find that the sequential Markov coalescent method approximates the coalescent well in general in models with demographic structure. An exception is the case where individuals are sampled from populations separated by reduced gene flow. In this situation, the correlations may be significantly underestimated. We explain why this is the case.  相似文献   

15.
This work studies the coalescent (ancestral pedigree, genealogy) of the entire population. The coalescent structure (topology) is robust, but selection changes the rate of coalescence (the time between branching events). The change in the rate of coalescence is not uniform, rather the reduction in the time between branching events is greatest when the coalescent is small (immediately after the common ancestor is the only member of the coalescent) with little change when the coalescent is large (immediately preceding when that common ancestor becomes fixed and the size of the coalescent is N). This provides that the reduction in the coalescent time due to selection is much greater than the reduction in the cumulative size of the coalescent (total number of ancestors of the present population after and including the most recent common ancestor) due to selection. If Ns≫1, the coalescent and fixation times are approximately equal to , which is much less than the value N which would result from neutral drift (N rather than the canonical haploid neutral fixation time 2N is the appropriate comparison for the model considered here), in particular, it is 70% less for Ns=10 and 95% less for Ns=100. However, for those values of Ns, and N ranging between 103 and 106, the reduction in the cumulative size of the coalescent of the entire population compared to the neutral case ranges from 17% to 65% (depending on the values of N and s). The coalescent time for two individuals for Ns of 10 and 100 is reduced by approximately 70% and 94%, respectively, compared with the neutral case. Because heterozygosity is proportional to the coalescent time for two individuals and the number of segregating alleles is proportional to the cumulative size of the coalescent, selection reduces heterozygosity more than it reduces the number of segregating alleles.  相似文献   

16.

Background

Coalescent simulation is pivotal for understanding population evolutionary models and demographic histories, as well as for developing novel analytical methods for genetic association studies for DNA sequence data. A plethora of coalescent simulators are developed, but selecting the most appropriate program remains challenging.

Results

We extensively compared performances of five widely used coalescent simulators – Hudson’s ms, msHOT, MaCS, Simcoal2, and fastsimcoal, to provide a practical guide considering three crucial factors, 1) speed, 2) scalability and 3) recombination hotspot position and intensity accuracy. Although ms represents a popular standard coalescent simulator, it lacks the ability to simulate sequences with recombination hotspots. An extended program msHOT has compensated for the deficiency of ms by incorporating recombination hotspots and gene conversion events at arbitrarily chosen locations and intensities, but remains limited in simulating long stretches of DNA sequences. Simcoal2, based on a discrete generation-by-generation approach, could simulate more complex demographic scenarios, but runs comparatively slow. MaCS and fastsimcoal, both built on fast, modified sequential Markov coalescent algorithms to approximate standard coalescent, are much more efficient whilst keeping salient features of msHOT and Simcoal2, respectively. Our simulations demonstrate that they are more advantageous over other programs for a spectrum of evolutionary models. To validate recombination hotspots, LDhat 2.2 rhomap package, sequenceLDhot and Haploview were compared for hotspot detection, and sequenceLDhot exhibited the best performance based on both real and simulated data.

Conclusions

While ms remains an excellent choice for general coalescent simulations of DNA sequences, MaCS and fastsimcoal are much more scalable and flexible in simulating a variety of demographic events under different recombination hotspot models. Furthermore, sequenceLDhot appears to give the most optimal performance in detecting and validating cross-over hotspots.  相似文献   

17.
Pervasive natural selection can strongly influence observed patterns of genetic variation, but these effects remain poorly understood when multiple selected variants segregate in nearby regions of the genome. Classical population genetics fails to account for interference between linked mutations, which grows increasingly severe as the density of selected polymorphisms increases. Here, we describe a simple limit that emerges when interference is common, in which the fitness effects of individual mutations play a relatively minor role. Instead, similar to models of quantitative genetics, molecular evolution is determined by the variance in fitness within the population, defined over an effectively asexual segment of the genome (a “linkage block”). We exploit this insensitivity in a new “coarse-grained” coalescent framework, which approximates the effects of many weakly selected mutations with a smaller number of strongly selected mutations that create the same variance in fitness. This approximation generates accurate and efficient predictions for silent site variability when interference is common. However, these results suggest that there is reduced power to resolve individual selection pressures when interference is sufficiently widespread, since a broad range of parameters possess nearly identical patterns of silent site variability.  相似文献   

18.
Davies JL  Simancík F  Lyngsø R  Mailund T  Hein J 《Genetics》2007,177(4):2151-2160
Coalescent theory deals with the dynamics of how sampled genetic material has spread through a population from a single ancestor over many generations and is ubiquitous in contemporary molecular population genetics. Inherent in most applications is a continuous-time approximation that is derived under the assumption that sample size is small relative to the actual population size. In effect, this precludes multiple and simultaneous coalescent events that take place in the history of large samples. If sequences do not recombine, the number of sequences ancestral to a large sample is reduced sufficiently after relatively few generations such that use of the continuous-time approximation is justified. However, in tracing the history of large chromosomal segments, a large recombination rate per generation will consistently maintain a large number of ancestors. This can create a major disparity between discrete-time and continuous-time models and we analyze its importance, illustrated with model parameters typical of the human genome. The presence of gene conversion exacerbates the disparity and could seriously undermine applications of coalescent theory to complete genomes. However, we show that multiple and simultaneous coalescent events influence global quantities, such as total number of ancestors, but have negligible effect on local quantities, such as linkage disequilibrium. Reassuringly, most applications of the coalescent model with recombination (including association mapping) focus on local quantities.  相似文献   

19.
Paul JS  Steinrücken M  Song YS 《Genetics》2011,187(4):1115-1128
The sequentially Markov coalescent is a simplified genealogical process that aims to capture the essential features of the full coalescent model with recombination, while being scalable in the number of loci. In this article, the sequentially Markov framework is applied to the conditional sampling distribution (CSD), which is at the core of many statistical tools for population genetic analyses. Briefly, the CSD describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. A hidden Markov model (HMM) formulation of the sequentially Markov CSD is developed here, yielding an algorithm with time complexity linear in both the number of loci and the number of haplotypes. This work provides a highly accurate, practical approximation to a recently introduced CSD derived from the diffusion process associated with the coalescent with recombination. It is empirically demonstrated that the improvement in accuracy of the new CSD over previously proposed HMM-based CSDs increases substantially with the number of loci. The framework presented here can be adopted in a wide range of applications in population genetics, including imputing missing sequence data, estimating recombination rates, and inferring human colonization history.  相似文献   

20.
The Kingman coalescent and its developments are often considered among the most important advances in population genetics of the last decades. Demographic inference based on coalescent theory has been used to reconstruct the population dynamics and evolutionary history of several species, including Mycobacterium tuberculosis (MTB), an important human pathogen causing tuberculosis. One key assumption of the Kingman coalescent is that the number of descendants of different individuals does not vary strongly, and violating this assumption could lead to severe biases caused by model misspecification. Individual lineages of MTB are expected to vary strongly in reproductive success because 1) MTB is potentially under constant selection due to the pressure of the host immune system and of antibiotic treatment, 2) MTB undergoes repeated population bottlenecks when it transmits from one host to the next, and 3) some hosts show much higher transmission rates compared with the average (superspreaders).Here, we used an approximate Bayesian computation approach to test whether multiple-merger coalescents (MMC), a class of models that allow for large variation in reproductive success among lineages, are more appropriate models to study MTB populations. We considered 11 publicly available whole-genome sequence data sets sampled from local MTB populations and outbreaks and found that MMC had a better fit compared with the Kingman coalescent for 10 of the 11 data sets. These results indicate that the null model for analyzing MTB outbreaks should be reassessed and that past findings based on the Kingman coalescent need to be revisited.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号