首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We introduce a flexible and robust simulation-based framework to infer demographic parameters from the site frequency spectrum (SFS) computed on large genomic datasets. We show that our composite-likelihood approach allows one to study evolutionary models of arbitrary complexity, which cannot be tackled by other current likelihood-based methods. For simple scenarios, our approach compares favorably in terms of accuracy and speed with , the current reference in the field, while showing better convergence properties for complex models. We first apply our methodology to non-coding genomic SNP data from four human populations. To infer their demographic history, we compare neutral evolutionary models of increasing complexity, including unsampled populations. We further show the versatility of our framework by extending it to the inference of demographic parameters from SNP chips with known ascertainment, such as that recently released by Affymetrix to study human origins. Whereas previous ways of handling ascertained SNPs were either restricted to a single population or only allowed the inference of divergence time between a pair of populations, our framework can correctly infer parameters of more complex models including the divergence of several populations, bottlenecks and migration. We apply this approach to the reconstruction of African demography using two distinct ascertained human SNP panels studied under two evolutionary models. The two SNP panels lead to globally very similar estimates and confidence intervals, and suggest an ancient divergence (>110 Ky) between Yoruba and San populations. Our methodology appears well suited to the study of complex scenarios from large genomic data sets.  相似文献   

2.
Simulation of biomolecular networks is now indispensable for studying biological systems, from small reaction networks to large ensembles of cells. Here we present a novel approach for stochastic simulation of networks embedded in the dynamic environment of the cell and its surroundings. We thus sample trajectories of the stochastic process described by the chemical master equation with time-varying propensities. A comparative analysis shows that existing approaches can either fail dramatically, or else can impose impractical computational burdens due to numerical integration of reaction propensities, especially when cell ensembles are studied. Here we introduce the Extrande method which, given a simulated time course of dynamic network inputs, provides a conditionally exact and several orders-of-magnitude faster simulation solution. The new approach makes it feasible to demonstrate—using decision-making by a large population of quorum sensing bacteria—that robustness to fluctuations from upstream signaling places strong constraints on the design of networks determining cell fate. Our approach has the potential to significantly advance both understanding of molecular systems biology and design of synthetic circuits.  相似文献   

3.
Reconstructing the dynamics of populations is complicated by the different types of stochasticity experienced by populations, in particular if some forms of stochasticity introduce bias in parameter estimation in addition to error. Identification of systematic biases is critical when determining whether the intrinsic dynamics of populations are stable or unstable and whether or not populations exhibit an Allee effect, i.e., a minimum size below which deterministic extinction should follow. Using a simulation model that allows for Allee effects and a range of intrinsic dynamics, we investigated how three types of stochasticity—demographic, environmental, and random catastrophes— affect our ability to reconstruct the intrinsic dynamics of populations. Demographic stochasticity aside, which is only problematic in small populations, we find that environmental stochasticity—positive and negative environmental fluctuations—caused increased error in parameter estimation, but bias was rarely problematic, except at the highest levels of noise. Random catastrophes, events causing large-scale mortality and likely to be more common than usually recognized, caused immediate bias in parameter estimates, in particular when Allee effects were large. In the latter case, population stability was predicted when endogenous dynamics were actually unstable and the minimum viable population size was overestimated in populations with small or non-existent Allee effects. Catastrophes also generally increased extinction risk, in particular when endogenous Allee effects were large. We propose a method for identifying data points likely resulting from catastrophic events when such events have not been recorded. Using social spider colonies (Anelosimus spp.) as models for populations, we show that after known or suspected catastrophes are accounted for, reconstructed growth parameters are consistent with intrinsic dynamical instability and substantial Allee effects. Our results are applicable to metapopulation or time series data and are relevant for predicting extinction in conservation applications or the management of invasive species.  相似文献   

4.
When sequencing an ancient DNA sample from a hominin fossil, DNA from present-day humans involved in excavation and extraction will be sequenced along with the endogenous material. This type of contamination is problematic for downstream analyses as it will introduce a bias towards the population of the contaminating individual(s). Quantifying the extent of contamination is a crucial step as it allows researchers to account for possible biases that may arise in downstream genetic analyses. Here, we present an MCMC algorithm to co-estimate the contamination rate, sequencing error rate and demographic parameters—including drift times and admixture rates—for an ancient nuclear genome obtained from human remains, when the putative contaminating DNA comes from present-day humans. We assume we have a large panel representing the putative contaminant population (e.g. European, East Asian or African). The method is implemented in a C++ program called ‘Demographic Inference with Contamination and Error’ (DICE). We applied it to simulations and genome data from ancient Neanderthals and modern humans. With reasonable levels of genome sequence coverage (>3X), we find we can recover accurate estimates of all these parameters, even when the contamination rate is as high as 50%.  相似文献   

5.
Elucidating the genetic basis of complex traits and diseases in non-European populations is particularly challenging because US minority populations have been under-represented in genetic association studies. We developed an empirical Bayes approach named XPEB (cross-population empirical Bayes), designed to improve the power for mapping complex-trait-associated loci in a minority population by exploiting information from genome-wide association studies (GWASs) from another ethnic population. Taking as input summary statistics from two GWASs—a target GWAS from an ethnic minority population of primary interest and an auxiliary base GWAS (such as a larger GWAS in Europeans)—our XPEB approach reprioritizes SNPs in the target population to compute local false-discovery rates. We demonstrated, through simulations, that whenever the base GWAS harbors relevant information, XPEB gains efficiency. Moreover, XPEB has the ability to discard irrelevant auxiliary information, providing a safeguard against inflated false-discovery rates due to genetic heterogeneity between populations. Applied to a blood-lipids study in African Americans, XPEB more than quadrupled the discoveries from the conventional approach, which used a target GWAS alone, bringing the number of significant loci from 14 to 65. Thus, XPEB offers a flexible framework for mapping complex traits in minority populations.  相似文献   

6.
simuPOP: a forward-time population genetics simulation environment   总被引:2,自引:0,他引:2  
  相似文献   

7.
BioMove simulates plant species' geographic range shifts in response to climate, habitat structure and disturbance, at annual time steps. This spatially explicit approach integrates species' bioclimatic suitability and population‐level demographic rates with simulation of landscape‐level processes (dispersal, disturbance, species' response to dynamic dominant vegetation structure). Species population dynamics are simulated through matrix modelling that includes scaling demographic rates by climatic suitability. Dispersal functions simulate population spread. User‐specified plant functional types (PFTs) provide vegetation structure that determines resource competition and disturbance. PFTs respond annually through dispersal, inter‐PFT competition and demographic shifts. BioMove provides a rich framework for dynamic range simulations.  相似文献   

8.
Kevin R. Thornton 《Genetics》2014,198(1):157-166
fwdpp is a C++ library of routines intended to facilitate the development of forward-time simulations under arbitrary mutation and fitness models. The library design provides a combination of speed, low memory overhead, and modeling flexibility not currently available from other forward simulation tools. The library is particularly useful when the simulation of large populations is required, as programs implemented using the library are much more efficient than other available forward simulation programs.  相似文献   

9.
Understanding the drivers of spatial patterns of genomic diversity has emerged as a major goal of evolutionary genetics. The flexibility of forward-time simulation makes it especially valuable for these efforts, allowing for the simulation of arbitrarily complex scenarios in a way that mimics how real populations evolve. Here, we present Geonomics, a Python package for performing complex, spatially explicit, landscape genomic simulations with full spatial pedigrees that dramatically reduces user workload yet remains customizable and extensible because it is embedded within a popular, general-purpose language. We show that Geonomics results are consistent with expectations for a variety of validation tests based on classic models in population genetics and then demonstrate its utility and flexibility with a trio of more complex simulation scenarios that feature polygenic selection, selection on multiple traits, simulation on complex landscapes, and nonstationary environmental change. We then discuss runtime, which is primarily sensitive to landscape raster size, memory usage, which is primarily sensitive to maximum population size and recombination rate, and other caveats related to the model’s methods for approximating recombination and movement. Taken together, our tests and demonstrations show that Geonomics provides an efficient and robust platform for population genomic simulations that capture complex spatial and evolutionary dynamics.  相似文献   

10.
The detection of epistatic interactive effects of multiple genetic variants on the susceptibility of human complex diseases is a great challenge in genome-wide association studies (GWAS). Although methods have been proposed to identify such interactions, the lack of an explicit definition of epistatic effects, together with computational difficulties, makes the development of new methods indispensable. In this paper, we introduce epistatic modules to describe epistatic interactive effects of multiple loci on diseases. On the basis of this notion, we put forward a Bayesian marker partition model to explain observed case-control data, and we develop a Gibbs sampling strategy to facilitate the detection of epistatic modules. Comparisons of the proposed approach with three existing methods on seven simulated disease models demonstrate the superior performance of our approach. When applied to a genome-wide case-control data set for Age-related Macular Degeneration (AMD), the proposed approach successfully identifies two known susceptible loci and suggests that a combination of two other loci—one in the gene SGCD and the other in SCAPER—is associated with the disease. Further functional analysis supports the speculation that the interaction of these two genetic variants may be responsible for the susceptibility of AMD. When applied to a genome-wide case-control data set for Parkinson's disease, the proposed method identifies seven suspicious loci that may contribute independently to the disease.  相似文献   

11.
Nemo is an individual-based, genetically explicit and stochastic population computer program for the simulation of population genetics and life-history trait evolution in a metapopulation context. It comes as both a C++ programming framework and an executable program file. Its object-oriented programming design gives it the flexibility and extensibility needed to implement a large variety of forward-time evolutionary models. It provides developers with abstract models allowing them to implement their own life-history traits and life-cycle events. Nemo offers a large panel of population models, from the Island model to lattice models with demographic or environmental stochasticity and a variety of already implemented traits (deleterious mutations, neutral markers and more), life-cycle events (mating, dispersal, aging, selection, etc.) and output operators for saving data and statistics. It runs on all major computer platforms including parallel computing environments. AVAILABILITY: The source code, binaries and documentation are available under the GNU General Public License at http://nemo2.sourceforge.net.  相似文献   

12.
To Malthus, rapid human population growth—so evident in 18th Century Europe—was obviously unsustainable. In his Essay on the Principle of Population, Malthus cogently argued that environmental and socioeconomic constraints on population rise were inevitable. Yet, he penned his essay on the eve of the global census size reaching one billion, as nearly two centuries of super-exponential increase were taking off. Introducing a novel extension of J. E. Cohen''s hallmark coupled difference equation model of human population dynamics and carrying capacity, this article examines just how elastic population growth limits may be in response to demographic change. The revised model involves a simple formalization of how consumption costs influence carrying capacity elasticity over time. Recognizing that complex social resource-extraction networks support ongoing consumption-based investment in family formation and intergenerational resource transfers, it is important to consider how consumption has impacted the human environment and demography—especially as global population has become very large. Sensitivity analysis of the consumption-cost model''s fit to historical population estimates, modern census data, and 21st Century demographic projections supports a critical conclusion. The recent population explosion was systemically determined by long-term, distinctly pre-industrial cultural evolution. It is suggested that modern globalizing transitions in technology, susceptibility to infectious disease, information flows and accumulation, and economic complexity were endogenous products of much earlier biocultural evolution of family formation''s embeddedness in larger, hierarchically self-organizing cultural systems, which could potentially support high population elasticity of carrying capacity. Modern super-exponential population growth cannot be considered separately from long-term change in the multi-scalar political economy that connects family formation and intergenerational resource transfers to wider institutions and social networks.  相似文献   

13.
A class of discrete-time models of infectious disease spread, referred to as individual-level models (ILMs), are typically fitted in a Bayesian Markov chain Monte Carlo (MCMC) framework. These models quantify probabilistic outcomes regarding the risk of infection of susceptible individuals due to various susceptibility and transmissibility factors, including their spatial distance from infectious individuals. The infectious pressure from infected individuals exerted on susceptible individuals is intrinsic to these ILMs. Unfortunately, quantifying this infectious pressure for data sets containing many individuals can be computationally burdensome, leading to a time-consuming likelihood calculation and, thus, computationally prohibitive MCMC-based analysis. This problem worsens when using data augmentation to allow for uncertainty in infection times. In this paper, we develop sampling methods that can be used to calculate a fast, approximate likelihood when fitting such disease models. A simple random sampling approach is initially considered followed by various spatially-stratified schemes. We test and compare the performance of our methods with both simulated data and data from the 2001 foot-and-mouth disease (FMD) epidemic in the U.K. Our results indicate that substantial computation savings can be obtained—albeit, of course, with some information loss—suggesting that such techniques may be of use in the analysis of very large epidemic data sets.  相似文献   

14.
It is widely acknowledged that genome-wide association studies (GWAS) of complex human disease fail to explain a large portion of heritability, primarily due to lack of statistical power—a problem that is exacerbated when seeking detection of interactions of multiple genomic loci. An untapped source of information that is already widely available, and that is expected to grow in coming years, is population samples. Such samples contain genetic marker data for additional individuals, but not their relevant phenotypes. In this article we develop a highly efficient testing framework based on a constrained maximum-likelihood estimate in a case–control–population setting. We leverage the available population data and optional modeling assumptions, such as Hardy–Weinberg equilibrium (HWE) in the population and linkage equilibrium (LE) between distal loci, to substantially improve power of association and interaction tests. We demonstrate, via simulation and application to actual GWAS data sets, that our approach is substantially more powerful and robust than standard testing approaches that ignore or make naive use of the population sample. We report several novel and credible pairwise interactions, in bipolar disorder, coronary artery disease, Crohn’s disease, and rheumatoid arthritis.  相似文献   

15.
We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more “continuous,” as in isolation-by-distance models.  相似文献   

16.

Background  

Forward-time simulations have unique advantages in power and flexibility for the simulation of genetic samples of complex human diseases because they can closely mimic the evolution of human populations carrying these diseases. However, a number of methodological and computational constraints have prevented the power of this simulation method from being fully explored in existing forward-time simulation methods.  相似文献   

17.
Population genetics simulation models are useful tools to study the effects of demography and environmental factors on genetic variation and genetic differentiation. They allow for studying species and populations with complex life histories, spatial distribution and many other complicating factors that make analytical treatment impracticable. Most simulation models are individual‐based: this poses a limitation to simulation of very large populations because of the limits in computer memory and long computation times. To overcome these limitations, we propose an intermediate approach that allows modelling of very complex demographic scenarios, which would be intractable with analytical models, and removes the limitations imposed by large population size, which affect individual‐based simulation models. We implement this approach in a software package for the r environment, MetaPopGen. The innovative concept of this approach with respect to the other population genetic simulators is that it focuses on genotype numbers rather than on individuals. Genotype numbers are iterated through time by using random number generators for appropriate probabilistic distributions to reproduce the stochasticity inherent to Mendelian segregation, survival, dispersal and reproduction. Features included in the model are age structure, monoecious and dioecious (or separate sexes) life cycles, mutation, dispersal and selection. The model simulates only one locus at a time. All demographic parameters can be genotype‐, sex‐, age‐, deme‐ and time‐dependent. MetaPopGen is therefore indicated to study large populations and very complex demographic scenarios. We illustrate the capabilities of MetaPopGen by applying it to the case of a marine fish metapopulation in the Mediterranean Sea.  相似文献   

18.
Choi SC  Hey J 《Genetics》2011,189(2):561-577
A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy-Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets.  相似文献   

19.
Australia, home to the iconic dingo, is currently free from canine rabies. However northern Australia, including Indigenous communities with large free-roaming domestic dog populations, is at increased risk of rabies incursion from nearby Indonesia. We developed a novel agent-based stochastic spatial rabies spread model to evaluate the potential spread of rabies within the dingo population of the Northern Peninsula Area (NPA) region of northern Australia. The model incorporated spatio-temporal features specific to this host-environment system, including landscape heterogeneity, demographic fluctuations, dispersal movements and dingo ecological parameters—such as home range size and density—derived from NPA field studies. Rabies spread between dingo packs in nearly 60% of simulations. In such situations rabies would affect a median of 22 dingoes (approximately 14% of the population; 2.5–97.5 percentiles: 2–101 dingoes) within the study area which covered 1,131 km2, and spread 0.52 km/week for 191 days. Larger outbreaks occurred in scenarios in which an incursion was introduced during the dry season (vs. wet season), and close to communities (vs. areas with high risk of interaction between dingoes and hunting community dogs). Sensitivity analyses revealed that home range size and duration of infectious clinical period contributed most to the variance of outputs. Although conditions in the NPA would most likely not support a sustained propagation of the disease in the dingo population, due to the predicted number of infected dingoes following a rabies incursion and the proximity of Indigenous communities to dingo habitat, we conclude that the risk for human transmission could be substantial.  相似文献   

20.
Risk maps estimating the spatial distribution of infectious diseases are required to guide public health policy from local to global scales. The advent of model-based geostatistics (MBG) has allowed these maps to be generated in a formal statistical framework, providing robust metrics of map uncertainty that enhances their utility for decision-makers. In many settings, decision-makers require spatially aggregated measures over large regions such as the mean prevalence within a country or administrative region, or national populations living under different levels of risk. Existing MBG mapping approaches provide suitable metrics of local uncertainty—the fidelity of predictions at each mapped pixel—but have not been adapted for measuring uncertainty over large areas, due largely to a series of fundamental computational constraints. Here the authors present a new efficient approximating algorithm that can generate for the first time the necessary joint simulation of prevalence values across the very large prediction spaces needed for global scale mapping. This new approach is implemented in conjunction with an established model for P. falciparum allowing robust estimates of mean prevalence at any specified level of spatial aggregation. The model is used to provide estimates of national populations at risk under three policy-relevant prevalence thresholds, along with accompanying model-based measures of uncertainty. By overcoming previously unchallenged computational barriers, this study illustrates how MBG approaches, already at the forefront of infectious disease mapping, can be extended to provide large-scale aggregate measures appropriate for decision-makers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号