首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We propose a two‐step procedure for estimating multiple migration rates in an approximate Bayesian computation (ABC) framework, accounting for global nuisance parameters. The approach is not limited to migration, but generally of interest for inference problems with multiple parameters and a modular structure (e.g. independent sets of demes or loci). We condition on a known, but complex demographic model of a spatially subdivided population, motivated by the reintroduction of Alpine ibex (Capra ibex) into Switzerland. In the first step, the global parameters ancestral mutation rate and male mating skew have been estimated for the whole population in Aeschbacher et al. (Genetics 2012; 192 : 1027). In the second step, we estimate in this study the migration rates independently for clusters of demes putatively connected by migration. For large clusters (many migration rates), ABC faces the problem of too many summary statistics. We therefore assess by simulation if estimation per pair of demes is a valid alternative. We find that the trade‐off between reduced dimensionality for the pairwise estimation on the one hand and lower accuracy due to the assumption of pairwise independence on the other depends on the number of migration rates to be inferred: the accuracy of the pairwise approach increases with the number of parameters, relative to the joint estimation approach. To distinguish between low and zero migration, we perform ABC‐type model comparison between a model with migration and one without. Applying the approach to microsatellite data from Alpine ibex, we find no evidence for substantial gene flow via migration, except for one pair of demes in one direction.  相似文献   

2.
Approximate Bayesian computation (ABC) is widely used to infer demographic history of populations and species using DNA markers. Genomic markers can now be developed for nonmodel species using reduced representation library (RRL) sequencing methods that select a fraction of the genome using targeted sequence capture or restriction enzymes (genotyping‐by‐sequencing, GBS). We explored the influence of marker number and length, knowledge of gametic phase, and tradeoffs between sample size and sequencing depth on the quality of demographic inferences performed with ABC. We focused on two‐population models of recent spatial expansion with varying numbers of unknown parameters. Performing ABC on simulated data sets with known parameter values, we found that the timing of a recent spatial expansion event could be precisely estimated in a three‐parameter model. Taking into account uncertainty in parameters such as initial population size and migration rate collectively decreased the precision of inferences dramatically. Phasing haplotypes did not improve results, regardless of sequence length. Numerous short sequences were as valuable as fewer, longer sequences, and performed best when a large sample size was sequenced at low individual depth, even when sequencing errors were added. ABC results were similar to results obtained with an alternative method based on the site frequency spectrum (SFS) when performed with unphased GBS‐type markers. We conclude that unphased GBS‐type data sets can be sufficient to precisely infer simple demographic models, and discuss possible improvements for the use of ABC with genomic data.  相似文献   

3.
Beerli P 《Molecular ecology》2004,13(4):827-836
Current estimators of gene flow come in two methods; those that estimate parameters assuming that the populations investigated are a small random sample of a large number of populations and those that assume that all populations were sampled. Maximum likelihood or Bayesian approaches that estimate the migration rates and population sizes directly using coalescent theory can easily accommodate datasets that contain a population that has no data, a so-called 'ghost' population. This manipulation allows us to explore the effects of missing populations on the estimation of population sizes and migration rates between two specific populations. The biases of the inferred population parameters depend on the magnitude of the migration rate from the unknown populations. The effects on the population sizes are larger than the effects on the migration rates. The more immigrants from the unknown populations that are arriving in the sample populations the larger the estimated population sizes. Taking into account a ghost population improves or at least does not harm the estimation of population sizes. Estimates of the scaled migration rate M (migration rate per generation divided by the mutation rate per generation) are fairly robust as long as migration rates from the unknown populations are not huge. The inclusion of a ghost population does not improve the estimation of the migration rate M; when the migration rates are estimated as the number of immigrants Nm then a ghost population improves the estimates because of its effect on population size estimation. It seems that for 'real world' analyses one should carefully choose which populations to sample, but there is no need to sample every population in the neighbourhood of a population of interest.  相似文献   

4.
J S Lopes  M Arenas  D Posada  M A Beaumont 《Heredity》2014,112(3):255-264
The estimation of parameters in molecular evolution may be biased when some processes are not considered. For example, the estimation of selection at the molecular level using codon-substitution models can have an upward bias when recombination is ignored. Here we address the joint estimation of recombination, molecular adaptation and substitution rates from coding sequences using approximate Bayesian computation (ABC). We describe the implementation of a regression-based strategy for choosing subsets of summary statistics for coding data, and show that this approach can accurately infer recombination allowing for intracodon recombination breakpoints, molecular adaptation and codon substitution rates. We demonstrate that our ABC approach can outperform other analytical methods under a variety of evolutionary scenarios. We also show that although the choice of the codon-substitution model is important, our inferences are robust to a moderate degree of model misspecification. In addition, we demonstrate that our approach can accurately choose the evolutionary model that best fits the data, providing an alternative for when the use of full-likelihood methods is impracticable. Finally, we applied our ABC method to co-estimate recombination, substitution and molecular adaptation rates from 24 published human immunodeficiency virus 1 coding data sets.  相似文献   

5.
Bayesian inference of recent migration rates using multilocus genotypes   总被引:25,自引:0,他引:25  
Wilson GA  Rannala B 《Genetics》2003,163(3):1177-1191
A new Bayesian method that uses individual multilocus genotypes to estimate rates of recent immigration (over the last several generations) among populations is presented. The method also estimates the posterior probability distributions of individual immigrant ancestries, population allele frequencies, population inbreeding coefficients, and other parameters of potential interest. The method is implemented in a computer program that relies on Markov chain Monte Carlo techniques to carry out the estimation of posterior probabilities. The program can be used with allozyme, microsatellite, RFLP, SNP, and other kinds of genotype data. We relax several assumptions of early methods for detecting recent immigrants, using genotype data; most significantly, we allow genotype frequencies to deviate from Hardy-Weinberg equilibrium proportions within populations. The program is demonstrated by applying it to two recently published microsatellite data sets for populations of the plant species Centaurea corymbosa and the gray wolf species Canis lupus. A computer simulation study suggests that the program can provide highly accurate estimates of migration rates and individual migrant ancestries, given sufficient genetic differentiation among populations and sufficient numbers of marker loci.  相似文献   

6.
Long‐distance migration is a common phenomenon across the animal kingdom but the scale of annual migratory movements has made it difficult for researchers to estimate survival rates during these periods of the annual cycle. Estimating migration survival is particularly challenging for small‐bodied species that cannot carry satellite tags, a group that includes the vast majority of migratory species. When capture–recapture data are available for linked breeding and non‐breeding populations, estimation of overall migration survival is possible but current methods do not allow separate estimation of spring and autumn survival rates. Recent development of a Bayesian integrated survival model has provided a method to separately estimate the latent spring and autumn survival rates using capture–recapture data, though the accuracy and precision of these estimates has not been formally tested. Here, I used simulated data to explore the estimability of migration survival rates using this model. Under a variety of biologically realistic scenarios, I demonstrate that spring and autumn migration survival can be estimated from the integrated survival model, though estimates are biased toward the overall migration survival probability. The direction and magnitude of this bias are influenced by the relative difference in spring and autumn survival rates as well as the degree of annual variation in these rates. The inclusion of covariates can improve the model's performance, especially when annual variation in migration survival rates is low. Migration survival rates can be estimated from relatively short time series (4–5 years), but bias and precision of estimates are improved when longer time series (10–12 years) are available. The ability to estimate seasonal survival rates of small, migratory organisms opens the door to advancing our understanding of the ecology and conservation of these species. Application of this method will enable researchers to better understand when mortality occurs across the annual cycle and how the migratory periods contribute to population dynamics. Integrating summer and winter capture data requires knowledge of the migratory connectivity of sampled populations and therefore efforts to simultaneously collect both survival and tracking data should be a high priority, especially for species of conservation concern.  相似文献   

7.
The inference of demographic parameters from genetic data has become an integral part of conservation studies. A group of Bayesian methods developed originally in population genetics, known as approximate Bayesian computation (ABC), has been shown to be particularly useful for the estimation of such parameters. These methods do not need to evaluate likelihood functions analytically and can therefore be used even while assuming complex models. In this paper we describe the ABC approach and identify specific parts of its algorithm that are being the subject of intensive studies in order to further expand its usability. Furthermore, we discuss applications of this Bayesian algorithm in conservation studies, providing insights on the potentialities of these tools. Finally, we present a case study in which we use a simple Isolation-Migration model to estimate a number of demographic parameters of two populations of yellow-eyed penguins (Megadyptes antipodes) in New Zealand. The resulting estimates confirm our current understanding of M. antipodes dynamic, demographic history and provide new insights into the expansion this species has undergone during the last centuries.  相似文献   

8.
Bayesian methods have become extremely popular in molecular ecology studies because they allow us to estimate demographic parameters of complex demographic scenarios using genetic data. Articles presenting new methods generally include sensitivity studies that evaluate their performance, but they tend to be limited and need to be followed by a more thorough evaluation. Here we evaluate the performance of a recent method, bayesass , which allows the estimation of recent migration rates among populations, as well as the inbreeding coefficient of each local population. We expand the simulation study of the original publication by considering multi-allelic markers and scenarios with varying number of populations. We also investigate the effect of varying migration rates and F ST more thoroughly in order to identify the region of parameter space where the method is and is not able to provide accurate estimates of migration rate. Results indicate that if the demographic history of the species being studied fits the assumptions of the inference model, and if genetic differentiation is not too low ( F ST ≥ 0.05), then the method can give fairly accurate estimates of migration rates even when they are fairly high (about 0.1). However, when the assumptions of the inference model are violated, accurate estimates are obtained only if migration rates are very low ( m  = 0.01) and genetic differentiation is high ( F ST ≥ 0.10). Our results also show that using posterior assignment probabilities as an indication of how much confidence we can place on the assignments is problematical since the posterior probability of assignment can be very high even when the individual assignments are very inaccurate.  相似文献   

9.
EM算法是在不完全信息资料下实现参数极大似然估计的一种通用方法.本文导出了双位点不同标记类型,包括共显性-共显性,共显性-显性和显性-显性3种模式下,估计遗传重组率的EM算法,以及获得重组率抽样方差的Bootstrap方法;并将之推广到部分个体缺失标记基因型(未检测到电泳谱带)下的重组率估计.通过大量Monte Carlo模拟研究发现: (1)连锁紧密时,样本容量对重组率的估计影响不大;连锁松散时,需要较大样本容量才可检测到连锁以及实现重组率的较精确估计.(2)用包含缺失标记的所有个体估计重组率比仅用其中的非缺失标记个体估计更准确,且可显著提高连锁检测的统计功效.  相似文献   

10.
Treatment selection markers are generally sought for when the benefit of an innovative treatment in comparison with a reference treatment is considered, and this benefit is suspected to vary according to the characteristics of the patients. Classically, such quantitative markers are detected through testing a marker-by-treatment interaction in a parametric regression model. Most alternative methods rely on modeling the risk of event occurrence in each treatment arm or the benefit of the innovative treatment over the marker values, but with assumptions that may be difficult to verify. Herein, a simple non-parametric approach is proposed to detect and assess the general capacity of a quantitative marker for treatment selection when no overall difference in efficacy could be demonstrated between two treatments in a clinical trial. This graphical method relies on the area between treatment-arm-specific receiver operating characteristic curves (ABC), which reflects the treatment selection capacity of the marker. A simulation study assessed the inference properties of the ABC estimator and compared them with other parametric and non-parametric indicators. The simulations showed that the estimate of the ABC had low bias, power comparable to parametric indicators, and that its confidence interval had a good coverage probability (better than the other non-parametric indicator in some cases). Thus, the ABC is a good alternative to parametric indicators. The ABC method was applied to data of the PETACC-8 trial that investigated FOLFOX4 versus FOLFOX4 + cetuximab in stage III colon adenocarcinoma. It enabled the detection of a treatment selection marker: the DDR2 gene.  相似文献   

11.
Accounting for historical demographic features, such as the strength and timing of gene flow and divergence times between closely related lineages, is vital for many inferences in evolutionary biology. Approximate Bayesian computation (ABC) is one method commonly used to estimate demographic parameters. However, the DNA sequences used as input for this method, often microsatellites or RADseq loci, usually represent a small fraction of the genome. Whole genome sequencing (WGS) data, on the other hand, have been used less often with ABC, and questions remain about the potential benefit of, and how to best implement, this type of data; we used pseudo‐observed data sets to explore such questions. Specifically, we addressed the potential improvements in parameter estimation accuracy that could be associated with WGS data in multiple contexts; namely, we quantified the effects of (a) more data, (b) haplotype‐based summary statistics, and (c) locus length. Compared with a hypothetical RADseq data set with 2.5 Mbp of data, using a 1 Gbp data set consisting of 100 Kbp sequences led to substantial gains in the accuracy of parameter estimates, which was mostly due to haplotype statistics and increased data. We also quantified the effects of including (a) locus‐specific recombination rates, and (b) background selection information in ABC analyses. Importantly, assuming uniform recombination or ignoring background selection had a negative effect on accuracy in many cases. Software and results from this method validation study should be useful for future demographic history analyses.  相似文献   

12.
Summary We estimate the parameters of a stochastic process model for a macroparasite population within a host using approximate Bayesian computation (ABC). The immunity of the host is an unobserved model variable and only mature macroparasites at sacrifice of the host are counted. With very limited data, process rates are inferred reasonably precisely. Modeling involves a three variable Markov process for which the observed data likelihood is computationally intractable. ABC methods are particularly useful when the likelihood is analytically or computationally intractable. The ABC algorithm we present is based on sequential Monte Carlo, is adaptive in nature, and overcomes some drawbacks of previous approaches to ABC. The algorithm is validated on a test example involving simulated data from an autologistic model before being used to infer parameters of the Markov process model for experimental data. The fitted model explains the observed extra‐binomial variation in terms of a zero‐one immunity variable, which has a short‐lived presence in the host.  相似文献   

13.
Variation in substitution rates among evolutionary lineages (among-lineage rate variation or ALRV) has been reported to negatively affect the estimation of phylogenies. When the substitution processes underlying ALRV are modeled inadequately, non-sister taxa with similar substitution rates are estimated incorrectly as sister species due to long-branch attraction. Recent advances in modeling site-specific rate variation (heterotachy) have reduced the impacts of ALRV on phylogeny estimation in several empirical and simulated datasets. However, the addition of parameters to the substitution model reduces power to estimate each parameter correctly, which can also lead to incorrect phylogeny estimation. A potential solution to this problem is to identify the levels of ALRV that negatively impact phylogeny estimation such that molecular markers with non-deleterious levels of ALRV can be identified. To this end, we used analyses of empirical and simulated gene datasets to evaluate whether levels of ALRV identified in a mitochondrial genomic dataset for salamanders negatively impacted phylogeny estimation. We simulated data with and without ALRV, holding all other evolutionary parameters constant, and compared the phylogenetic performance of both simulated and empirical datasets. Overall, we found limited, positive effects of ALRV on phylogeny estimation in this dataset, the majority of which resulted from an increase in substitution rate on short branches. We conclude that ALRV does not always negatively impact phylogeny estimation. Therefore, ALRV can likely be disregarded as a criterion for marker selection in comparable phylogenetic studies.  相似文献   

14.
In this paper a theory is developed that provides the sampling distribution of alleles at a diallelic marker locus closely linked to a low-frequency allele that arose as a single mutant. The sampling distribution provides a basis for maximum-likelihood estimation of either the recombination rate, the mutation rate, or the age of the allele, provided that the two other parameters are known. This theory is applied to (1) the data of Hästbacka et al., to estimate the recombination rate between a locus associated with diastrophic dysplasia and a linked RFLP marker; (2) the data of Risch et al., to estimate the age of a presumptive allele causing idiopathic distortion dystonia in Ashkenazi jews; and (3) the data of Tishkoff et al., to estimate the date at which, at the CD4 locus, non-African lineages diverged from African lineages. We conclude that the extent of linkage disequilibrium can lead to relatively accurate estimates of recombination and mutation rates and that those estimates are not very sensitive to parameters, such as the population age, whose values are not known with certainty. In contrast, we also conclude that, in many cases, linkage disequilibrium may not lead to useful estimates of allele age, because of the relatively large degree of uncertainly in those estimates.  相似文献   

15.
Identifying population structure is one of the most common and important objectives of spatial analyses using population genetic data. Population structure is detected either by rejecting the null hypothesis of a homogenous distribution of genetic variation, or by estimating low migration rates. Issues arise with most current population genetic inference methods when the genetic divergence is low among putative populations. Low levels of genetic divergence may be as a result of either high ongoing migration or historic high migration but no current, ongoing migration. We direct attention to recent developments in the use of the tempo-spatial distribution of closely related individuals to detect population structure or estimate current migration rates. These 'kinship-based' approaches complement more traditional population-based genetic inference methods by providing a means to detect population structure and estimate current migration rates when genetic divergence is low. However, for kinship-based methods to become widely adopted, formal estimation procedures applicable to a range of species life histories are needed.  相似文献   

16.
Most modern population genetics inference methods are based on the coalescence framework. Methods that allow estimating parameters of structured populations commonly insert migration events into the genealogies. For these methods the calculation of the coalescence probability density of a genealogy requires a product over all time periods between events. Data sets that contain populations with high rates of gene flow among them require an enormous number of calculations. A new method, transition probability-structured coalescence (TPSC), replaces the discrete migration events with probability statements. Because the speed of calculation is independent of the amount of gene flow, this method allows calculating the coalescence densities efficiently. The current implementation of TPSC uses an approximation simplifying the interaction among lineages. Simulations and coverage comparisons of TPSC vs. MIGRATE show that TPSC allows estimation of high migration rates more precisely, but because of the approximation the estimation of low migration rates is biased. The implementation of TPSC into programs that calculate quantities on phylogenetic tree structures is straightforward, so the TPSC approach will facilitate more general inferences in many computer programs.  相似文献   

17.
Meuwissen TH  Goddard ME 《Genetics》2007,176(4):2551-2560
A novel multipoint method, based on an approximate coalescence approach, to analyze multiple linked markers is presented. Unlike other approximate coalescence methods, it considers all markers simultaneously but only two haplotypes at a time. We demonstrate the use of this method for linkage disequilibrium (LD) mapping of QTL and estimation of effective population size. The method estimates identity-by-descent (IBD) probabilities between pairs of marker haplotypes. Both LD and combined linkage and LD mapping rely on such IBD probabilities. The method is approximate in that it considers only the information on a pair of haplotypes, whereas a full modeling of the coalescence process would simultaneously consider all haplotypes. However, full coalescence modeling is computationally feasible only for few linked markers. Using simulations of the coalescence process, the method is shown to give almost unbiased estimates of the effective population size. Compared to direct marker and haplotype association analyses, IBD-based QTL mapping showed clearly a higher power to detect a QTL and a more realistic confidence interval for its position. The modeling of LD could be extended to estimate other LD-related parameters such as recombination rates.  相似文献   

18.
Understanding the processes and conditions under which populations diverge to give rise to distinct species is a central question in evolutionary biology. Since recently diverged populations have high levels of shared polymorphisms, it is challenging to distinguish between recent divergence with no (or very low) inter-population gene flow and older splitting events with subsequent gene flow. Recently published methods to infer speciation parameters under the isolation-migration framework are based on summarizing polymorphism data at multiple loci in two species using the joint site-frequency spectrum (JSFS). We have developed two improvements of these methods based on a more extensive use of the JSFS classes of polymorphisms for species with high intra-locus recombination rates. First, using a likelihood based method, we demonstrate that taking into account low-frequency polymorphisms shared between species significantly improves the joint estimation of the divergence time and gene flow between species. Second, we introduce a local linear regression algorithm that considerably reduces the computational time and allows for the estimation of unequal rates of gene flow between species. We also investigate which summary statistics from the JSFS allow the greatest estimation accuracy for divergence time and migration rates for low (around 10) and high (around 100) numbers of loci. Focusing on cases with low numbers of loci and high intra-locus recombination rates we show that our methods for the estimation of divergence time and migration rates are more precise than existing approaches.  相似文献   

19.
We developed a spatially explicit model of a bioinvasion and used an approximate Bayesian computation (ABC) framework to make various inferences from a combination of genetic (microsatellite genotypes), historical (first observation dates) and geographical (spatial coordinates of introduction and sampled sites) information. Our method aims to discriminate between alternative introduction scenarios and to estimate posterior densities of demographically relevant parameters of the invasive process. The performance of our landscape-ABC method is assessed using simulated data sets differing in their information content (genetic and/or historical data). We apply our methodology to the recent introduction and spatial expansion of the cane toad, Bufo marinus, in northern Australia. We find that, at least in the context of cane toad invasion, historical data are more informative than genetic data for discriminating between introduction scenarios. However, the combination of historical and genetic data provides the most accurate estimates of demographic parameters. For the cane toad, we find some evidence for a strong bottleneck prior to introduction, a small initial number of founder individuals (about 15), a large population growth rate (about 400% per generation), a standard deviation of dispersal distance of 19 km per generation and a high invasion speed at equilibrium (50 km per year). Our approach strengthens the application of the ABC method to the field of bioinvasion by allowing statistical inferences to be made on the introduction and the spatial expansion dynamics of invasive species using a combination of various relevant sources of information.  相似文献   

20.
Wang J  Whitlock MC 《Genetics》2003,163(1):429-446
In the past, moment and likelihood methods have been developed to estimate the effective population size (N(e)) on the basis of the observed changes of marker allele frequencies over time, and these have been applied to a large variety of species and populations. Such methods invariably make the critical assumption of a single isolated population receiving no immigrants over the study interval. For most populations in the real world, however, migration is not negligible and can substantially bias estimates of N(e) if it is not accounted for. Here we extend previous moment and maximum-likelihood methods to allow the joint estimation of N(e) and migration rate (m) using genetic samples over space and time. It is shown that, compared to genetic drift acting alone, migration results in changes in allele frequency that are greater in the short term and smaller in the long term, leading to under- and overestimation of N(e), respectively, if it is ignored. Extensive simulations are run to evaluate the newly developed moment and likelihood methods, which yield generally satisfactory estimates of both N(e) and m for populations with widely different effective sizes and migration rates and patterns, given a reasonably large sample size and number of markers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号