共查询到20条相似文献,搜索用时 9 毫秒
1.
Mikael Sunn?ker Alberto Giovanni Busetto Elina Numminen Jukka Corander Matthieu Foll Christophe Dessimoz 《PLoS computational biology》2013,9(1)
Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics. In all model-based statistical inference, the likelihood function is of central importance, since it expresses the probability of the observed data under a particular statistical model, and thus quantifies the support data lend to particular values of parameters and to choices among different models. For simple models, an analytical formula for the likelihood function can typically be derived. However, for more complex models, an analytical formula might be elusive or the likelihood function might be computationally very costly to evaluate. ABC methods bypass the evaluation of the likelihood function. In this way, ABC methods widen the realm of models for which statistical inference can be considered. ABC methods are mathematically well-founded, but they inevitably make assumptions and approximations whose impact needs to be carefully assessed. Furthermore, the wider application domain of ABC exacerbates the challenges of parameter estimation and model selection. ABC has rapidly gained popularity over the last years and in particular for the analysis of complex problems arising in biological sciences (e.g., in population genetics, ecology, epidemiology, and systems biology).
This is a “Topic Page” article for PLOS Computational Biology.相似文献
2.
Krishna R. Veeramah August E. Woerner Laurel Johnstone Ivo Gut Marta Gut Tomas Marques-Bonet Lucia Carbone Jeff D. Wall Michael F. Hammer 《Genetics》2015,200(1):295-308
Gibbons are believed to have diverged from the larger great apes ∼16.8 MYA and today reside in the rainforests of Southeast Asia. Based on their diploid chromosome number, the family Hylobatidae is divided into four genera, Nomascus, Symphalangus, Hoolock, and Hylobates. Genetic studies attempting to elucidate the phylogenetic relationships among gibbons using karyotypes, mitochondrial DNA (mtDNA), the Y chromosome, and short autosomal sequences have been inconclusive . To examine the relationships among gibbon genera in more depth, we performed second-generation whole genome sequencing (WGS) to a mean of ∼15× coverage in two individuals from each genus. We developed a coalescent-based approximate Bayesian computation (ABC) method incorporating a model of sequencing error generated by high coverage exome validation to infer the branching order, divergence times, and effective population sizes of gibbon taxa. Although Hoolock and Symphalangus are likely sister taxa, we could not confidently resolve a single bifurcating tree despite the large amount of data analyzed. Instead, our results support the hypothesis that all four gibbon genera diverged at approximately the same time. Assuming an autosomal mutation rate of 1 × 10−9/site/year this speciation process occurred ∼5 MYA during a period in the Early Pliocene characterized by climatic shifts and fragmentation of the Sunda shelf forests. Whole genome sequencing of additional individuals will be vital for inferring the extent of gene flow among species after the separation of the gibbon genera. 相似文献
3.
An automated calibration method is proposed and applied to the complex hydro-ecological model Delft3D-BLOOM which is calibrated from monitoring data of the lake Champs-sur-Marne, a small shallow urban lake in the Paris region (France). This method (ABC-RF-SA) combines Approximate Bayesian Computation (ABC) with the machine learning algorithm Random Forest (RF) and a Sensitivity Analysis (SA) of the model parameters. Three target variables are used (total chlorophyll, cyanobacteria and dissolved oxygen concentration) to calibrate 133 parameters. ABC-RF-SA is first applied on a set of simulated observations to validate the methodology. It is then applied on a real set of high-frequency observations recorded during about two weeks on the lake Champs-sur-Marne. The methodology is also compared to standard ABC and ABC-RF formulations. Only ABC-RF-SA allowed the model to reproduce the observed biogeochemical dynamics. The coupling of ABC with RF and SA thus appears crucial for its application to complex hydro-ecological models. 相似文献
4.
5.
Xavier Rubio-Campillo 《PloS one》2016,11(1)
Formal Models and History
Computational models are increasingly being used to study historical dynamics. This new trend, which could be named Model-Based History, makes use of recently published datasets and innovative quantitative methods to improve our understanding of past societies based on their written sources. The extensive use of formal models allows historians to re-evaluate hypotheses formulated decades ago and still subject to debate due to the lack of an adequate quantitative framework. The initiative has the potential to transform the discipline if it solves the challenges posed by the study of historical dynamics. These difficulties are based on the complexities of modelling social interaction, and the methodological issues raised by the evaluation of formal models against data with low sample size, high variance and strong fragmentation.Case Study
This work examines an alternate approach to this evaluation based on a Bayesian-inspired model selection method. The validity of the classical Lanchester’s laws of combat is examined against a dataset comprising over a thousand battles spanning 300 years. Four variations of the basic equations are discussed, including the three most common formulations (linear, squared, and logarithmic) and a new variant introducing fatigue. Approximate Bayesian Computation is then used to infer both parameter values and model selection via Bayes Factors.Impact
Results indicate decisive evidence favouring the new fatigue model. The interpretation of both parameter estimations and model selection provides new insights into the factors guiding the evolution of warfare. At a methodological level, the case study shows how model selection methods can be used to guide historical research through the comparison between existing hypotheses and empirical evidence. 相似文献6.
The choice of summary statistics is a crucial step in approximate Bayesian computation (ABC). Since statistics are often not sufficient, this choice involves a trade-off between loss of information and reduction of dimensionality. The latter may increase the efficiency of ABC. Here, we propose an approach for choosing summary statistics based on boosting, a technique from the machine-learning literature. We consider different types of boosting and compare them to partial least-squares regression as an alternative. To mitigate the lack of sufficiency, we also propose an approach for choosing summary statistics locally, in the putative neighborhood of the true parameter value. We study a demographic model motivated by the reintroduction of Alpine ibex (Capra ibex) into the Swiss Alps. The parameters of interest are the mean and standard deviation across microsatellites of the scaled ancestral mutation rate (θanc = 4Neu) and the proportion of males obtaining access to matings per breeding season (ω). By simulation, we assess the properties of the posterior distribution obtained with the various methods. According to our criteria, ABC with summary statistics chosen locally via boosting with the L2-loss performs best. Applying that method to the ibex data, we estimate and find that most of the variation across loci of the ancestral mutation rate u is between 7.7 × 10−4 and 3.5 × 10−3 per locus per generation. The proportion of males with access to matings is estimated as , which is in good agreement with recent independent estimates. 相似文献
7.
The juvenile life stage is a crucial determinant of forest dynamics and a first indicator of changes to species' ranges under climate change. However, paucity of detailed re-measurement data of seedlings, saplings and small trees means that their demography is not well understood at large scales, and rarely represented in forest models in detail. In this study we quantify the effects of climate and density dependence on recruitment and juvenile growth and mortality rates of thirteen species measured in the Spanish Forest Inventory. Single-census sapling count data is used to constrain demographic parameters of a simple forest juvenile dynamics model based on the perfect plasticity approximation model (PPA) within a likelihood-free parameterisation method, Approximate Bayesian Computation. Our results highlight marked differences between species, and the important role of climate and stand structure, in controlling juvenile dynamics. Recruitment had a hump-shaped relationship with conspecific density, and for most species conspecific competition had a stronger negative effect than heterospecific competition. Mediterranean species showed on average higher mortality and lower growth rates than temperate species, and in low density stands recruitment and mortality rates were positively correlated. Under climate change our model predicted declines in recruitment rates for almost all species. Reliable predictive models of forest dynamics should include realistic representation of critical early life-stage processes and our approach demonstrates that existing coarse count data can be used to parameterise such models. Approximate Bayesian Computation may have wide application in many fields of ecology to unlock information about past processes from single survey observations. 相似文献
8.
Approximate Bayesian Computation Without Summary Statistics: The Case of Admixture 总被引:1,自引:0,他引:1 下载免费PDF全文
In recent years approximate Bayesian computation (ABC) methods have become popular in population genetics as an alternative to full-likelihood methods to make inferences under complex demographic models. Most ABC methods rely on the choice of a set of summary statistics to extract information from the data. In this article we tested the use of the full allelic distribution directly in an ABC framework. Although the ABC techniques are becoming more widely used, there is still uncertainty over how they perform in comparison with full-likelihood methods. We thus conducted a simulation study and provide a detailed examination of ABC in comparison with full likelihood in the case of a model of admixture. This model assumes that two parental populations mixed at a certain time in the past, creating a hybrid population, and that the three populations then evolve under pure drift. Several aspects of ABC methodology were investigated, such as the effect of the distance metric chosen to measure the similarity between simulated and observed data sets. Results show that in general ABC provides good approximations to the posterior distributions obtained with the full-likelihood method. This suggests that it is possible to apply ABC using allele frequencies to make inferences in cases where it is difficult to select a set of suitable summary statistics and when the complexity of the model or the size of the data set makes it computationally prohibitive to use full-likelihood methods. 相似文献
9.
Summary We estimate the parameters of a stochastic process model for a macroparasite population within a host using approximate Bayesian computation (ABC). The immunity of the host is an unobserved model variable and only mature macroparasites at sacrifice of the host are counted. With very limited data, process rates are inferred reasonably precisely. Modeling involves a three variable Markov process for which the observed data likelihood is computationally intractable. ABC methods are particularly useful when the likelihood is analytically or computationally intractable. The ABC algorithm we present is based on sequential Monte Carlo, is adaptive in nature, and overcomes some drawbacks of previous approaches to ABC. The algorithm is validated on a test example involving simulated data from an autologistic model before being used to infer parameters of the Markov process model for experimental data. The fitted model explains the observed extra‐binomial variation in terms of a zero‐one immunity variable, which has a short‐lived presence in the host. 相似文献
10.
Exploring Approximate Bayesian Computation for inferring recent demographic history with genomic markers in nonmodel species 下载免费PDF全文
Approximate Bayesian computation (ABC) is widely used to infer demographic history of populations and species using DNA markers. Genomic markers can now be developed for nonmodel species using reduced representation library (RRL) sequencing methods that select a fraction of the genome using targeted sequence capture or restriction enzymes (genotyping‐by‐sequencing, GBS). We explored the influence of marker number and length, knowledge of gametic phase, and tradeoffs between sample size and sequencing depth on the quality of demographic inferences performed with ABC. We focused on two‐population models of recent spatial expansion with varying numbers of unknown parameters. Performing ABC on simulated data sets with known parameter values, we found that the timing of a recent spatial expansion event could be precisely estimated in a three‐parameter model. Taking into account uncertainty in parameters such as initial population size and migration rate collectively decreased the precision of inferences dramatically. Phasing haplotypes did not improve results, regardless of sequence length. Numerous short sequences were as valuable as fewer, longer sequences, and performed best when a large sample size was sequenced at low individual depth, even when sequencing errors were added. ABC results were similar to results obtained with an alternative method based on the site frequency spectrum (SFS) when performed with unphased GBS‐type markers. We conclude that unphased GBS‐type data sets can be sufficient to precisely infer simple demographic models, and discuss possible improvements for the use of ABC with genomic data. 相似文献
11.
Oliver Ratmann Gé Donker Adam Meijer Christophe Fraser Katia Koelle 《PLoS computational biology》2012,8(12)
A key priority in infectious disease research is to understand the ecological and evolutionary drivers of viral diseases from data on disease incidence as well as viral genetic and antigenic variation. We propose using a simulation-based, Bayesian method known as Approximate Bayesian Computation (ABC) to fit and assess phylodynamic models that simulate pathogen evolution and ecology against summaries of these data. We illustrate the versatility of the method by analyzing two spatial models describing the phylodynamics of interpandemic human influenza virus subtype A(H3N2). The first model captures antigenic drift phenomenologically with continuously waning immunity, and the second epochal evolution model describes the replacement of major, relatively long-lived antigenic clusters. Combining features of long-term surveillance data from the Netherlands with features of influenza A (H3N2) hemagglutinin gene sequences sampled in northern Europe, key phylodynamic parameters can be estimated with ABC. Goodness-of-fit analyses reveal that the irregularity in interannual incidence and H3N2''s ladder-like hemagglutinin phylogeny are quantitatively only reproduced under the epochal evolution model within a spatial context. However, the concomitant incidence dynamics result in a very large reproductive number and are not consistent with empirical estimates of H3N2''s population level attack rate. These results demonstrate that the interactions between the evolutionary and ecological processes impose multiple quantitative constraints on the phylodynamic trajectories of influenza A(H3N2), so that sequence and surveillance data can be used synergistically. ABC, one of several data synthesis approaches, can easily interface a broad class of phylodynamic models with various types of data but requires careful calibration of the summaries and tolerance parameters. 相似文献
12.
Hansen’s disease (leprosy) elimination has proven difficult in several countries, including Brazil, and there is a need for a mathematical model that can predict control program efficacy. This study applied the Approximate Bayesian Computation algorithm to fit 6 different proposed models to each of the 5 regions of Brazil, then fitted hierarchical models based on the best-fit regional models to the entire country. The best model proposed for most regions was a simple model. Posterior checks found that the model results were more similar to the observed incidence after fitting than before, and that parameters varied slightly by region. Current control programs were predicted to require additional measures to eliminate Hansen’s Disease as a public health problem in Brazil. 相似文献
13.
构建系统发生树时,其拓扑结构会在不同的基因组区域产生不一致性。对此问题,贝叶斯一致性分析法(BCA)可在全基因组规模上进行系统发生树分析,并进而对不一致性信息进行量化统计。采用此方法对由C3H/Hu小鼠(Mus musculus)和129Sv小鼠回交多代产生的129S1小鼠进行系统发生树分析,输入相应的一组序列文件,用若干生物信息学软件(如VCFtools,Repeat Masker,PAUP*4.0,Mr Model Test,Mr Bayes等)对其进行屏蔽重复序列、序列比对等处理,辅以Perl语言脚本,最终得到全基因组范围不同区段系统发生树不一致信息。在小鼠10号染色体的所有99个基因座中,支持129S1和129Sv品系小鼠为姐妹关系的拓扑结构占了84.7%(后验概率最高),这证明了C3H/Hu小鼠对129S1小鼠基因组的贡献程度较小。结果表明,贝叶斯一致性分析法有助于基因组不同区段进化历史的研究。 相似文献
14.
Estoup A Lombaert E Marin JM Guillemaud T Pudlo P Robert CP Cornuet JM 《Molecular ecology resources》2012,12(5):846-855
Comparison of demo‐genetic models using Approximate Bayesian Computation (ABC) is an active research field. Although large numbers of populations and models (i.e. scenarios) can be analysed with ABC using molecular data obtained from various marker types, methodological and computational issues arise when these numbers become too large. Moreover, Robert et al. (Proceedings of the National Academy of Sciences of the United States of America, 2011, 108, 15112) have shown that the conclusions drawn on ABC model comparison cannot be trusted per se and required additional simulation analyses. Monte Carlo inferential techniques to empirically evaluate confidence in scenario choice are very time‐consuming, however, when the numbers of summary statistics (Ss) and scenarios are large. We here describe a methodological innovation to process efficient ABC scenario probability computation using linear discriminant analysis (LDA) on Ss before computing logistic regression. We used simulated pseudo‐observed data sets (pods) to assess the main features of the method (precision and computation time) in comparison with traditional probability estimation using raw (i.e. not LDA transformed) Ss. We also illustrate the method on real microsatellite data sets produced to make inferences about the invasion routes of the coccinelid Harmonia axyridis. We found that scenario probabilities computed from LDA‐transformed and raw Ss were strongly correlated. Type I and II errors were similar for both methods. The faster probability computation that we observed (speed gain around a factor of 100 for LDA‐transformed Ss) substantially increases the ability of ABC practitioners to analyse large numbers of pods and hence provides a manageable way to empirically evaluate the power available to discriminate among a large set of complex scenarios. 相似文献
15.
Reconstructing the demographic history of orang‐utans using Approximate Bayesian Computation 下载免费PDF全文
Alexander Nater Maja P. Greminger Natasha Arora Carel P. van Schaik Benoit Goossens Ian Singleton Ernst J. Verschoor Kristin S. Warren Michael Krützen 《Molecular ecology》2015,24(2):310-327
Investigating how different evolutionary forces have shaped patterns of DNA variation within and among species requires detailed knowledge of their demographic history. Orang‐utans, whose distribution is currently restricted to the South‐East Asian islands of Borneo (Pongo pygmaeus) and Sumatra (Pongo abelii), have likely experienced a complex demographic history, influenced by recurrent changes in climate and sea levels, volcanic activities and anthropogenic pressures. Using the most extensive sample set of wild orang‐utans to date, we employed an Approximate Bayesian Computation (ABC) approach to test the fit of 12 different demographic scenarios to the observed patterns of variation in autosomal, X‐chromosomal, mitochondrial and Y‐chromosomal markers. In the best‐fitting model, Sumatran orang‐utans exhibit a deep split of populations north and south of Lake Toba, probably caused by multiple eruptions of the Toba volcano. In addition, we found signals for a strong decline in all Sumatran populations ~24 ka, probably associated with hunting by human colonizers. In contrast, Bornean orang‐utans experienced a severe bottleneck ~135 ka, followed by a population expansion and substructuring starting ~82 ka, which we link to an expansion from a glacial refugium. We showed that orang‐utans went through drastic changes in population size and connectedness, caused by recurrent contraction and expansion of rainforest habitat during Pleistocene glaciations and probably hunting by early humans. Our findings emphasize the fact that important aspects of the evolutionary past of species with complex demographic histories might remain obscured when applying overly simplified models. 相似文献
16.
Contemporary gene flow, when resumed after a period of isolation, can have crucial consequences for endangered species, as it can both increase the supply of adaptive alleles and erode local adaptation. Determining the history of gene flow and thus the importance of contemporary hybridization, however, is notoriously difficult. Here, we focus on two endangered plant species, Arabis nemorensis and A. sagittata, which hybridize naturally in a sympatric population located on the banks of the Rhine. Using reduced genome sequencing, we determined the phylogeography of the two taxa but report only a unique sympatric population. Molecular variation in chloroplast DNA indicated that A. sagittata is the principal receiver of gene flow. Applying classical D-statistics and its derivatives to whole-genome data of 35 accessions, we detect gene flow not only in the sympatric population but also among allopatric populations. Using an Approximate Bayesian computation approach, we identify the model that best describes the history of gene flow between these taxa. This model shows that low levels of gene flow have persisted long after speciation. Around 10 000 years ago, gene flow stopped and a period of complete isolation began. Eventually, a hotspot of contemporary hybridization was formed in the unique sympatric population. Occasional sympatry may have helped protect these lineages from extinction in spite of their extremely low diversity. 相似文献
17.
The principles by which networks of neurons compute, and how spike-timing dependent plasticity (STDP) of synaptic weights generates and maintains their computational function, are unknown. Preceding work has shown that soft winner-take-all (WTA) circuits, where pyramidal neurons inhibit each other via interneurons, are a common motif of cortical microcircuits. We show through theoretical analysis and computer simulations that Bayesian computation is induced in these network motifs through STDP in combination with activity-dependent changes in the excitability of neurons. The fundamental components of this emergent Bayesian computation are priors that result from adaptation of neuronal excitability and implicit generative models for hidden causes that are created in the synaptic weights through STDP. In fact, a surprising result is that STDP is able to approximate a powerful principle for fitting such implicit generative models to high-dimensional spike inputs: Expectation Maximization. Our results suggest that the experimentally observed spontaneous activity and trial-to-trial variability of cortical neurons are essential features of their information processing capability, since their functional role is to represent probability distributions rather than static neural codes. Furthermore it suggests networks of Bayesian computation modules as a new model for distributed information processing in the cortex. 相似文献
18.
Elja Arjas Liping Liu Niko Maglaperidze 《Biometrical journal. Biometrische Zeitschrift》1997,39(6):741-759
A nonparametric hierarchical growth curve model is proposed. Different levels in the model hierarchy are intended to correspond to different sources of variation in an individual's growth. The nonparametric character of the model offers considerable flexibility in fitting the growth curves to empirical data. Here the emphasis is on prediction, and for this purpose the adopted Bayesian inferential approach seems particularly natural and efficient. A Markov chain Carlo method is used to perform the numerical computations. As an illustration of the techniques, we consider the growth of children, during their first two years. 相似文献
19.
Insect pest phylogeography might be shaped both by biogeographic events and by human influence. Here, we conducted an approximate Bayesian computation (ABC) analysis to investigate the phylogeography of the New World screwworm fly, Cochliomyia hominivorax, with the aim of understanding its population history and its order and time of divergence. Our ABC analysis supports that populations spread from North to South in the Americas, in at least two different moments. The first split occurred between the North/Central American and South American populations in the end of the Last Glacial Maximum (15,300-19,000 YBP). The second split occurred between the North and South Amazonian populations in the transition between the Pleistocene and the Holocene eras (9,100-11,000 YBP). The species also experienced population expansion. Phylogenetic analysis likewise suggests this north to south colonization and Maxent models suggest an increase in the number of suitable areas in South America from the past to present. We found that the phylogeographic patterns observed in C. hominivorax cannot be explained only by climatic oscillations and can be connected to host population histories. Interestingly we found these patterns are very coincident with general patterns of ancient human movements in the Americas, suggesting that humans might have played a crucial role in shaping the distribution and population structure of this insect pest. This work presents the first hypothesis test regarding the processes that shaped the current phylogeographic structure of C. hominivorax and represents an alternate perspective on investigating the problem of insect pests. 相似文献