首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
To model deviations from selectively neutral genetic variation caused by different forms of selection, it is necessary to first understand patterns of neutral variation. Best understood is neutral genetic variation at a single locus. But, as is well known, additional insights can be gained by investigating multiple loci. The resulting patterns reflect the degree of association (linkage) between loci and provide information about the underlying multilocus gene genealogies. The statistical properties of two-locus gene genealogies have been intensively studied for populations of constant size, as well as for simple demographic histories such as exponential population growth and single bottlenecks. By contrast, the combined effect of recombination and sustained demographic fluctuations is poorly understood. Addressing this issue, we study a two-locus Wright-Fisher model of a population subject to recurrent bottlenecks. We derive coalescent approximations for the covariance of the times to the most recent common ancestor at two loci in samples of two chromosomes. This covariance reflects the degree of association and thus linkage disequilibrium between these loci. We find, first, that an effective population-size approximation describes the numerically observed association between two loci provided that recombination occurs either much faster or much more slowly than the population-size fluctuations. Second, when recombination occurs frequently between but rarely within bottlenecks, we observe that the association of gene histories becomes independent of physical distance over a certain range of distances. Third, we show that in this case, a commonly used measure of linkage disequilibrium, σ(2)(d) (closely related to r(2)), fails to capture the long-range association between two loci. The reason is that constituent terms, each reflecting the long-range association, cancel. Fourth, we analyze a limiting case in which the long-range association can be described in terms of a Xi coalescent allowing for simultaneous multiple mergers of ancestral lines.  相似文献   

2.
Choi SC  Hey J 《Genetics》2011,189(2):561-577
A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy-Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets.  相似文献   

3.
Detecting population expansion and decline using microsatellites   总被引:15,自引:0,他引:15  
Beaumont MA 《Genetics》1999,153(4):2013-2029
This article considers a demographic model where a population varies in size either linearly or exponentially. The genealogical history of microsatellite data sampled from this population can be described using coalescent theory. A method is presented whereby the posterior probability distribution of the genealogical and demographic parameters can be estimated using Markov chain Monte Carlo simulations. The likelihood surface for the demographic parameters is complicated and its general features are described. The method is then applied to published microsatellite data from two populations. Data from the northern hairy-nosed wombat show strong evidence of decline. Data from European humans show weak evidence of expansion.  相似文献   

4.
Inference of population structure under a Dirichlet process model   总被引:1,自引:0,他引:1       下载免费PDF全文
Huelsenbeck JP  Andolfatto P 《Genetics》2007,175(4):1787-1802
Inferring population structure from genetic data sampled from some number of individuals is a formidable statistical problem. One widely used approach considers the number of populations to be fixed and calculates the posterior probability of assigning individuals to each population. More recently, the assignment of individuals to populations and the number of populations have both been considered random variables that follow a Dirichlet process prior. We examined the statistical behavior of assignment of individuals to populations under a Dirichlet process prior. First, we examined a best-case scenario, in which all of the assumptions of the Dirichlet process prior were satisfied, by generating data under a Dirichlet process prior. Second, we examined the performance of the method when the genetic data were generated under a population genetics model with symmetric migration between populations. We examined the accuracy of population assignment using a distance on partitions. The method can be quite accurate with a moderate number of loci. As expected, inferences on the number of populations are more accurate when theta = 4N(e)u is large and when the migration rate (4N(e)m) is low. We also examined the sensitivity of inferences of population structure to choice of the parameter of the Dirichlet process model. Although inferences could be sensitive to the choice of the prior on the number of populations, this sensitivity occurred when the number of loci sampled was small; inferences are more robust to the prior on the number of populations when the number of sampled loci is large. Finally, we discuss several methods for summarizing the results of a Bayesian Markov chain Monte Carlo (MCMC) analysis of population structure. We develop the notion of the mean population partition, which is the partition of individuals to populations that minimizes the squared partition distance to the partitions sampled by the MCMC algorithm.  相似文献   

5.
The coalescent with recombination process has initially been formulated backwards in time, but simulation algorithms and inference procedures often apply along sequences. Therefore it is of major interest to approximate the coalescent with recombination process by a Markov chain along sequences. We consider the finite loci case and two or more sequences. We formulate a natural Markovian approximation for the tree building process along the sequences, and derive simple and analytically tractable formulae for the distribution of the tree at the next locus conditioned on the tree at the present locus. We compare our Markov approximation to other sequential Markov chains and discuss various applications.  相似文献   

6.
Coalescent theory is routinely used to estimate past population dynamics and demographic parameters from genealogies. While early work in coalescent theory only considered simple demographic models, advances in theory have allowed for increasingly complex demographic scenarios to be considered. The success of this approach has lead to coalescent-based inference methods being applied to populations with rapidly changing population dynamics, including pathogens like RNA viruses. However, fitting epidemiological models to genealogies via coalescent models remains a challenging task, because pathogen populations often exhibit complex, nonlinear dynamics and are structured by multiple factors. Moreover, it often becomes necessary to consider stochastic variation in population dynamics when fitting such complex models to real data. Using recently developed structured coalescent models that accommodate complex population dynamics and population structure, we develop a statistical framework for fitting stochastic epidemiological models to genealogies. By combining particle filtering methods with Bayesian Markov chain Monte Carlo methods, we are able to fit a wide class of stochastic, nonlinear epidemiological models with different forms of population structure to genealogies. We demonstrate our framework using two structured epidemiological models: a model with disease progression between multiple stages of infection and a two-population model reflecting spatial structure. We apply the multi-stage model to HIV genealogies and show that the proposed method can be used to estimate the stage-specific transmission rates and prevalence of HIV. Finally, using the two-population model we explore how much information about population structure is contained in genealogies and what sample sizes are necessary to reliably infer parameters like migration rates.  相似文献   

7.
Conventional coalescent inferences of population history make the critical assumption that the population under examination is panmictic. However, most populations are structured. This complicates the prevailing coalescent analyses and sometimes leads to inaccurate estimates. To develop a coalescent method unhampered by population structure, we perform two analyses. First, we demonstrate that the coalescent probability of two randomly sampled alleles from the immediate preceding generation(one generation back)is independent of population structure. Second, motivated by this finding, we propose a new coalescent method: i-coalescent analysis. The i-coalescent analysis computes the instantaneous coalescent rate by using a phylogenetic tree of sampled alleles. Using simulated data, we broadly demonstrate the capability of i-coalescent analysis to accurately reconstruct population size dynamics of highly structured populations, although we find this method often requires larger sample sizes for structured populations than for panmictic populations. Overall, our results indicate i-coalescent analysis to be a useful tool, especially for the inference of population histories with intractable structure such as the developmental history of cell populations in the organs of complex organisms.  相似文献   

8.
Throughout history, the population size of modern humans has varied considerably due to changes in environment, culture, and technology. More accurate estimates of population size changes, and when they occurred, should provide a clearer picture of human colonization history and help remove confounding effects from natural selection inference. Demography influences the pattern of genetic variation in a population, and thus genomic data of multiple individuals sampled from one or more present-day populations contain valuable information about the past demographic history. Recently, Li and Durbin developed a coalescent-based hidden Markov model, called the pairwise sequentially Markovian coalescent (PSMC), for a pair of chromosomes (or one diploid individual) to estimate past population sizes. This is an efficient, useful approach, but its accuracy in the very recent past is hampered by the fact that, because of the small sample size, only few coalescence events occur in that period. Multiple genomes from the same population contain more information about the recent past, but are also more computationally challenging to study jointly in a coalescent framework. Here, we present a new coalescent-based method that can efficiently infer population size changes from multiple genomes, providing access to a new store of information about the recent past. Our work generalizes the recently developed sequentially Markov conditional sampling distribution framework, which provides an accurate approximation of the probability of observing a newly sampled haplotype given a set of previously sampled haplotypes. Simulation results demonstrate that we can accurately reconstruct the true population histories, with a significant improvement over the PSMC in the recent past. We apply our method, called diCal, to the genomes of multiple human individuals of European and African ancestry to obtain a detailed population size change history during recent times.  相似文献   

9.
The multispecies coalescent provides an elegant theoretical framework for estimating species trees and species demographics from genetic markers. However, practical applications of the multispecies coalescent model are limited by the need to integrate or sample over all gene trees possible for each genetic marker. Here we describe a polynomial-time algorithm that computes the likelihood of a species tree directly from the markers under a finite-sites model of mutation effectively integrating over all possible gene trees. The method applies to independent (unlinked) biallelic markers such as well-spaced single nucleotide polymorphisms, and we have implemented it in SNAPP, a Markov chain Monte Carlo sampler for inferring species trees, divergence dates, and population sizes. We report results from simulation experiments and from an analysis of 1997 amplified fragment length polymorphism loci in 69 individuals sampled from six species of Ourisia (New Zealand native foxglove).  相似文献   

10.
We present a novel and straightforward method for estimating recent migration rates between discrete populations using multilocus genotype data. The approach builds upon a two-step sampling design, where individual genotypes are sampled before and after dispersal. We develop a model that estimates all pairwise backwards migration rates ( mij , the probability that an individual sampled in population i is a migrant from population j ) between a set of populations. The method is validated with simulated data and compared with the methods of BayesAss and Structure. First, we use data for an island model and then we consider more realistic data simulations for a metapopulation of the greater white-toothed shrew ( Crocidura russula ). We show that the precision and bias of estimates primarily depend upon the proportion of individuals sampled in each population. Weak sampling designs may particularly affect the quality of the coverage provided by 95% highest posterior density intervals. We further show that it is relatively insensitive to the number of loci sampled and the overall strength of genetic structure. The method can easily be extended and makes fewer assumptions about the underlying demographic and genetic processes than currently available methods. It allows backwards migration rates to be estimated across a wide range of realistic conditions.  相似文献   

11.
Rapid range expansions can cause pervasive changes in the genetic diversity and structure of populations. The postglacial history of the Balsam Poplar, Populus balsamifera, involved the colonization of most of northern North America, an area largely covered by continental ice sheets during the last glacial maximum. To characterize how this expansion shaped genomic diversity within and among populations, we developed 412 SNP markers that we assayed for a range‐wide sample of 474 individuals sampled from 34 populations. We complemented the SNP data set with DNA sequence data from 11 nuclear loci from 94 individuals, and used coalescent analyses to estimate historical population size, demographic growth, and patterns of migration. Bayesian clustering identified three geographically separated demes found in the Northern, Central, and Eastern portions of the species’ range. These demes varied significantly in nucleotide diversity, the abundance of private polymorphisms, and population substructure. Most measures supported the Central deme as descended from the primary refuge of diversity. Both SNPs and sequence data suggested recent population growth, and coalescent analyses of historical migration suggested a massive expansion from the Centre to the North and East. Collectively, these data demonstrate the strong influence that range expansions exert on genomic diversity, both within local populations and across the range. Our results suggest that an in‐depth knowledge of nucleotide diversity following expansion requires sampling within multiple populations, and highlight the utility of combining insights from different data types in population genomic studies.  相似文献   

12.
The variance of sample heterozygosity, averaged over several loci, is studied in a variety of situations. The variance depends on the sampling implicit in the mating system as well as on that explicit in the loci scored and individuals sampled. There are also effects of allelic distributions over loci and of linkage or linkage disequilibrium between pairs of loci. Results are obtained for populations in drift and mutation balance, for infinite populations undergoing mixed self and random mating, and for finite monoecious populations with or without selfing. For unlinked loci in drift/mutation balance, variances appear to be lessened more by increasing the number of loci scored than by increasing the number of individuals sampled. For infinite populations under the mixed self and random mating system, however, the reverse is true. Methods for estimating the variance of sample heterozygosity are discussed, with attention being paid to unbalanced data where not all loci are scored in all individuals.  相似文献   

13.
Paul JS  Steinrücken M  Song YS 《Genetics》2011,187(4):1115-1128
The sequentially Markov coalescent is a simplified genealogical process that aims to capture the essential features of the full coalescent model with recombination, while being scalable in the number of loci. In this article, the sequentially Markov framework is applied to the conditional sampling distribution (CSD), which is at the core of many statistical tools for population genetic analyses. Briefly, the CSD describes the probability that an additionally sampled DNA sequence is of a certain type, given that a collection of sequences has already been observed. A hidden Markov model (HMM) formulation of the sequentially Markov CSD is developed here, yielding an algorithm with time complexity linear in both the number of loci and the number of haplotypes. This work provides a highly accurate, practical approximation to a recently introduced CSD derived from the diffusion process associated with the coalescent with recombination. It is empirically demonstrated that the improvement in accuracy of the new CSD over previously proposed HMM-based CSDs increases substantially with the number of loci. The framework presented here can be adopted in a wide range of applications in population genetics, including imputing missing sequence data, estimating recombination rates, and inferring human colonization history.  相似文献   

14.
Tsumura Y  Kado T  Takahashi T  Tani N  Ujino-Ihara T  Iwata H 《Genetics》2007,176(4):2393-2403
We investigated 29 natural populations of Cryptomeria japonica using 148 cleaved amplified polymorphic sequence markers to elucidate their genetic structure and identify candidate adaptive genes of this species. In accordance with the inferred evolutionary history of the species during and after the last glacial episode, the genetic diversity was higher in western populations than in northern populations. The results of phylogenetic and genetic structure analyses suggest that populations of the two main varieties of the species have clearly diverged from each other and that two of the examined loci are strongly associated with the differentiation between the two varieties. Using a coalescent simulation based on F(ST) and H(e) values, we detected five genes that had higher, and two that had lower, values than the respective 99% confidence intervals (C.I.s) that are theoretically expected intervals under a neutral infinite-island model. We also detected 13 outlier loci using a coalescent simulation based on the assumption that the 2 varieties originated from the splitting of an ancestral population. Four of these loci were detected by both methods, two of which were detected in a genetic structure analysis as loci associated with differentiation between the two varieties of the species, and are strong candidates for genes that have been subject to selection.  相似文献   

15.
Populations can be genetically isolated both by geographic distance and by differences in their ecology or environment that decrease the rate of successful migration. Empirical studies often seek to investigate the relationship between genetic differentiation and some ecological variable(s) while accounting for geographic distance, but common approaches to this problem (such as the partial Mantel test) have a number of drawbacks. In this article, we present a Bayesian method that enables users to quantify the relative contributions of geographic distance and ecological distance to genetic differentiation between sampled populations or individuals. We model the allele frequencies in a set of populations at a set of unlinked loci as spatially correlated Gaussian processes, in which the covariance structure is a decreasing function of both geographic and ecological distance. Parameters of the model are estimated using a Markov chain Monte Carlo algorithm. We call this method Bayesian Estimation of Differentiation in Alleles by Spatial Structure and Local Ecology (BEDASSLE), and have implemented it in a user‐friendly format in the statistical platform R. We demonstrate its utility with a simulation study and empirical applications to human and teosinte data sets.  相似文献   

16.
Andolfatto P  Przeworski M 《Genetics》2000,156(1):257-268
We analyze nucleotide polymorphism data for a large number of loci in areas of normal to high recombination in Drosophila melanogaster and D. simulans (24 and 16 loci, respectively). We find a genome-wide, systematic departure from the neutral expectation for a panmictic population at equilibrium in natural populations of both species. The distribution of sequence-based estimates of 2Nc across loci is inconsistent with the assumptions of the standard neutral theory, given the observed levels of nucleotide diversity and accepted values for recombination and mutation rates. Under these assumptions, most estimates of 2Nc are severalfold too low; in other words, both species exhibit greater intralocus linkage disequilibrium than expected. Variation in recombination or mutation rates is not sufficient to account for the excess of linkage disequilibrium. While an equilibrium island model does not seem to account for the data, more complicated forms of population structure may. A proper test of alternative demographic models will require loci to be sampled in a more consistent fashion.  相似文献   

17.
Loci considered to be under selection are generally avoided in attempts to infer past demographic processes as they do not fit neutral model assumptions. However, opportunities to better reconstruct some aspects of past demography might thus be missed. Here we examined genetic differentiation between two sympatric European oak species with contrasting ecological dynamics (Quercus robur and Quercus petraea) with both outlier (i.e. loci possibly affected by divergent selection between species or by hitchhiking effects with genomic regions under selection) and nonoutlier loci. We sampled 855 individuals in six mixed forests in France and genotyped them with a set of 262 SNPs enriched with markers showing high interspecific differentiation, resulting in accurate species delimitation. We identified between 13 and 74 interspecific outlier loci, depending on the coalescent simulation models and parameters used. Greater genetic diversity was predicted in Q. petraea (a late‐successional species) than in Q. robur (an early successional species) as introgression should theoretically occur predominantly from the resident species to the invading species. Remarkably, this prediction was verified with outlier loci but not with nonoutlier loci. We suggest that the lower effective interspecific gene flow at loci showing high interspecific divergence has better preserved the signal of past asymmetric introgression towards Q. petraea caused by the species' contrasting dynamics. Using markers under selection to reconstruct past demographic processes could therefore have broader potential than generally recognized.  相似文献   

18.
We have developed a set of eight polymorphic nuclear microsatellite markers for the Mediterranean shrub Pistacia lentiscus by means of an enriched library method. Characterization for the eight loci was carried out on 42 individuals from two populations sampled in southern Spain. The overall number of alleles detected was 59, ranging from three to 13 per locus. Expected heterozygosity per locus and population ranged from 0.139 to 0.895. Two loci albeit only in one population (Seville) departed significantly from Hardy-Weinberg equilibrium expectations and no linkage disequilibrium between pairs of loci was detected. These markers will be used in studies of gene flow across a fragmented landscape.  相似文献   

19.
In this study we characterized 10 polymorphic microsatellite markers for the land snail Cylindrus obtusus, an endemism of the Austrian Alps with a distribution in isolated populations above approximately 1,600 m. The microsatellite loci were analyzed in 44 individuals from two populations. Number of alleles per locus ranged between two and eight. Observed heterozygosity ranged between 0.00 and 1.00, and expected heterozygosity between 0.09 and 0.72. No significant linkage disequilibrium was found between pairs of loci. One of the sampled populations (Dachstein) showed no deviation from Hardy–Weinberg equilibrium and no presence of null alleles, whereas the other one (Schneeberg) did. These diverging results probably reflect differences in population structure rather than characteristics of the microsatellite loci and underline the usefulness of these markers for studying genetic diversity, population structure and differentiation in C. obtusus.  相似文献   

20.
Coalescent likelihood is the probability of observing the given population sequences under the coalescent model. Computation of coalescent likelihood under the infinite sites model is a classic problem in coalescent theory. Existing methods are based on either importance sampling or Markov chain Monte Carlo and are inexact. In this paper, we develop a simple method that can compute the exact coalescent likelihood for many data sets of moderate size, including real biological data whose likelihood was previously thought to be difficult to compute exactly. Our method works for both panmictic and subdivided populations. Simulations demonstrate that the practical range of exact coalescent likelihood computation for panmictic populations is significantly larger than what was previously believed. We investigate the application of our method in estimating mutation rates by maximum likelihood. A main application of the exact method is comparing the accuracy of approximate methods. To demonstrate the usefulness of the exact method, we evaluate the accuracy of program Genetree in computing the likelihood for subdivided populations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号