首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Approximate Bayesian computation in population genetics   总被引:23,自引:0,他引:23  
Beaumont MA  Zhang W  Balding DJ 《Genetics》2002,162(4):2025-2035
We propose a new method for approximate Bayesian statistical inference on the basis of summary statistics. The method is suited to complex problems that arise in population genetics, extending ideas developed in this setting by earlier authors. Properties of the posterior distribution of a parameter, such as its mean or density curve, are approximated without explicit likelihood calculations. This is achieved by fitting a local-linear regression of simulated parameter values on simulated summary statistics, and then substituting the observed summary statistics into the regression equation. The method combines many of the advantages of Bayesian statistical inference with the computational efficiency of methods based on summary statistics. A key advantage of the method is that the nuisance parameters are automatically integrated out in the simulation step, so that the large numbers of nuisance parameters that arise in population genetics problems can be handled without difficulty. Simulation results indicate computational and statistical efficiency that compares favorably with those of alternative methods previously proposed in the literature. We also compare the relative efficiency of inferences obtained using methods based on summary statistics with those obtained directly from the data using MCMC.  相似文献   

2.
We describe a novel method for jointly estimating crossing-over and gene-conversion rates from population genetic data using summary statistics. The performance of our method was tested on simulated data sets and compared with the composite-likelihood method of R. R. Hudson. For several realistic parameter values, the new method performed similarly to the composite-likelihood approach for estimating crossing-over rates and better when estimating gene-conversion rates. We used our method to analyze a human data set recently genotyped by Perlegen Sciences.  相似文献   

3.
BACKGROUND: Haplotype sharing statistics have been introduced in an ad-hoc way, often relying heavily on permutation testing. As a result, applying these approaches to whole genome association studies or to evaluate their properties in extensive simulation experiments is problematic. Further, permutation testing may be inappropriate in the presence of phase ambiguity and population stratification. AIMS: To present a simple framework for a class of haplotype sharing statistics useful for association mapping in case-parent trio data. This framework allows derivation of novel haplotype sharing tests as well as simple variance estimators and asymptotic distributions for haplotype sharing tests. RESULTS AND CONCLUSIONS: We validated that our approach is appropriately sized using simulated data, and illustrate the methodology by analyzing a Crohn's disease dataset. We find that haplotype-based analyses are much more powerful than single-locus analyses for these data.  相似文献   

4.
Vasco DA 《Genetics》2008,179(2):951-963
The estimation of ancestral and current effective population sizes in expanding populations is a fundamental problem in population genetics. Recently it has become possible to scan entire genomes of several individuals within a population. These genomic data sets can be used to estimate basic population parameters such as the effective population size and population growth rate. Full-data-likelihood methods potentially offer a powerful statistical framework for inferring population genetic parameters. However, for large data sets, computationally intensive methods based upon full-likelihood estimates may encounter difficulties. First, the computational method may be prohibitively slow or difficult to implement for large data. Second, estimation bias may markedly affect the accuracy and reliability of parameter estimates, as suggested from past work on coalescent methods. To address these problems, a fast and computationally efficient least-squares method for estimating population parameters from genomic data is presented here. Instead of modeling genomic data using a full likelihood, this new approach uses an analogous function, in which the full data are replaced with a vector of summary statistics. Furthermore, these least-squares estimators may show significantly less estimation bias for growth rate and genetic diversity than a corresponding maximum-likelihood estimator for the same coalescent process. The least-squares statistics also scale up to genome-sized data sets with many nucleotides and loci. These results demonstrate that least-squares statistics will likely prove useful for nonlinear parameter estimation when the underlying population genomic processes have complex evolutionary dynamics involving interactions between mutation, selection, demography, and recombination.  相似文献   

5.
This paper describes methods for using categorical temporal data to detect differences in behavior between a treated group and a control group. The first-level output from the data is typically a set of many different correlated test statistics comparing the two groups. In previous work, a decision was made by counting the number of significant individual tests and calibrating with bootstrap simulation. This article goes further, suggesting two possible alternative statistics: the sum of the squared individual test statistics and a Wald-like combination of the individual test statistics. All three overall comparison statistics are defined and a method for computing critical values from simulated distributions using a bootstrap method is given. The use of all three methods is then demonstrated on each of three data sets. Finally, a simulated power study reveals that the Wald-like statistic is much better than the other two, leading to the suggestion of its use in place of the other two statistics.  相似文献   

6.
Many studies of synaptic transmission have assumed a parametric model to estimate the mean quantal content and size or the effect upon them of manipulations such as the induction of long-term potentiation. Classical tests of fit usually assume that model parameters have been selected independently of the data. Therefore, their use is problematic after parameters have been estimated. We hypothesized that Monte Carlo (MC) simulations of a quantal model could provide a table of parameter-independent critical values with which to test the fit after parameter estimation, emulating Lilliefors's tests. However, when we tested this hypothesis within a conventional quantal model, the empirical distributions of two conventional goodness-of-fit statistics were affected by the values of the quantal parameters, falsifying the hypothesis. Notably, the tests' critical values increased when the combined variances of the noise and quantal-size distributions were reduced, increasing the distinctness of quantal peaks. Our results support two conclusions. First, tests that use a predetermined critical value to assess the fit of a quantal model after parameter estimation may operate at a differing unknown level of significance for each experiment. Second, a MC test enables a valid assessment of the fit of a quantal model after parameter estimation.  相似文献   

7.
MOTIVATION: Diffusable and non-diffusable gene products play a major role in body plan formation. A quantitative understanding of the spatio-temporal patterns formed in body plan formation, by using simulation models is an important addition to experimental observation. The inverse modelling approach consists of describing the body plan formation by a rule-based model, and fitting the model parameters to real observed data. In body plan formation, the data are usually obtained from fluorescent immunohistochemistry or in situ hybridizations. Inferring model parameters by comparing such data to those from simulation is a major computational bottleneck. An important aspect in this process is the choice of method used for parameter estimation. When no information on parameters is available, parameter estimation is mostly done by means of heuristic algorithms. RESULTS: We show that parameter estimation for pattern formation models can be efficiently performed using an evolution strategy (ES). As a case study we use a quantitative spatio-temporal model of the regulatory network for early development in Drosophila melanogaster. In order to estimate the parameters, the simulated results are compared to a time series of gene products involved in the network obtained with immunohistochemistry. We demonstrate that a (mu,lambda)-ES can be used to find good quality solutions in the parameter estimation. We also show that an ES with multiple populations is 5-140 times as fast as parallel simulated annealing for this case study, and that combining ES with a local search results in an efficient parameter estimation method.  相似文献   

8.
Kuhner MK  Smith LP 《Genetics》2007,175(1):155-165
We have developed a Bayesian version of our likelihood-based Markov chain Monte Carlo genealogy sampler LAMARC and compared the two versions for estimation of theta = 4N(e)mu, exponential growth rate, and recombination rate. We used simulated DNA data to assess accuracy of means and support or credibility intervals. In all cases the two methods had very similar results. Some parameter combinations led to overly narrow support or credibility intervals, excluding the truth more often than the desired percentage, for both methods. However, the Bayesian approach rejected the generative parameter values significantly less often than the likelihood approach, both in cases where the level of rejection was normal and in cases where it was too high.  相似文献   

9.
Determination of material parameters for soft tissue frequently involves regression of material parameters for nonlinear, anisotropic constitutive models against experimental data from heterogeneous tests. Here, parameter estimation based on membrane inflation is considered. A four parameter nonlinear, anisotropic hyperelastic strain energy function was used to model the material, in which the parameters are cast in terms of key response features. The experiment was simulated using finite element (FE) analysis in order to predict the experimental measurements of pressure versus profile strain. Material parameter regression was automated using inverse FE analysis; parameter values were updated by use of both local and global techniques, and the ability of these techniques to efficiently converge to a best case was examined. This approach provides a framework in which additional experimental data, including surface strain measurements or local structural information, may be incorporated in order to quantify heterogeneous nonlinear material properties.  相似文献   

10.
We present a Bayesian method for functional response parameter estimation starting from time series of field data on predator–prey dynamics. Population dynamics is described by a system of stochastic differential equations in which behavioral stochasticities are represented by noise terms affecting each population as well as their interaction. We focus on the estimation of a behavioral parameter appearing in the functional response of predator to prey abundance when a small number of observations is available. To deal with small sample sizes, latent data are introduced between each pair of field observations and are considered as missing data. The method is applied to both simulated and observational data. The results obtained using different numbers of latent data are compared with those achieved following a frequentist approach. As a case study, we consider an acarine predator–prey system relevant to biological control problems.  相似文献   

11.
Matrix population models are one of the most common mathematical models in ecology, which describe the dynamics of stage-structured populations and provide us many population statistics. One of the statistics, elasticity onto population growth rate, is frequently used and represents the degree of the relative impact of life history parameters to the population growth rate. Due to the utility of elasticities for cross-taxonomic comparisons, Silvertown and his coauthors have published multiple papers and reported the relationship between elasticities and life forms (or life history) in multiple plant species, using a triangle map (called “ternary plot”). To understand why their elasticities are located in specific regions of the ternary plot, we constructed four archetypes of population matrices, from which we simulated 24,000 randomly generated population matrices and obtained the consequent elasticities. We found a large discrepancy when comparing our results to those in Silvertown et al.'s study (Conserv Biol 10:591–597, 1996): for our simulated matrices where rapid transitions were not allowed (e.g., trees), the elasticity distribution resulted in a line across the ternary plot. We provided the mathematical proof for this result, and found that its slope depends on matrix dimension. We also used 1230 matrices from the COMPADRE Plant Matrix Database and calculated the elasticities. Our simulated results were validated with field data from COMPADRE: two straight lines appeared in the ternary plot. Furthermore, we answered several addressed questions, such as, “Is there any special elasticity distribution in matrices with high population growth rates?” and “Why are the elasticities of natural populations concentrated in the upper half of the ternary plot?”.  相似文献   

12.
A comparison has been made between the estimates obtained from maximum likelihood estimation of gamma, inverse normal, and normal distribution models for stage-frequency data. Results have been compared for six of sets of test data, and from many sets of simulated data. It is concluded that (1) some estimates may differ substantially between the models, (2) estimates from the correct model have little bias, and estimated standard errors are generally close to theoretical values, (3) there are problems in determining degrees of freedom for chi-squared goodness of fit tests, so that it is best to compare test statistics with simulated distributions, and (4) goodness of fit tests may not discriminate well between the three models.  相似文献   

13.
In this article, we develop an admixture F model (AFM) for the estimation of population-level coancestry coefficients from neutral molecular markers. In contrast to the previously published F model, the AFM enables disentangling small population size and lack of migration as causes of genetic differentiation behind a given level of FST. We develop a Bayesian estimation scheme for fitting the AFM to multiallelic data acquired from a number of local populations. We demonstrate the performance of the AFM, using simulated data sets and real data on ninespine sticklebacks (Pungitius pungitius) and common shrews (Sorex araneus). The results show that the parameterization of the AFM conveys more information about the evolutionary history than a simple summary parameter such as FST. The methods are implemented in the R package RAFM.  相似文献   

14.
Choi SC  Hey J 《Genetics》2011,189(2):561-577
A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy-Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets.  相似文献   

15.
Problems of establishing equivalence or noninferiority between two medical diagnostic procedures involve comparisons of the response rates between correlated proportions. When the sample size is small, the asymptotic tests may not be reliable. This article proposes an unconditional exact test procedure to assess equivalence or noninferiority. Two statistics, a sample-based test statistic and a restricted maximum likelihood estimation (RMLE)-based test statistic, to define the rejection region of the exact test are considered. We show the p-value of the proposed unconditional exact tests can be attained at the boundary point of the null hypothesis. Assessment of equivalence is often based on a comparison of the confidence limits with the equivalence limits. We also derive the unconditional exact confidence intervals on the difference of the two proportion means for the two test statistics. A typical data set of comparing two diagnostic procedures is analyzed using the proposed unconditional exact and asymptotic methods. The p-value from the unconditional exact tests is generally larger than the p-value from the asymptotic tests. In other words, an exact confidence interval is generally wider than the confidence interval obtained from an asymptotic test.  相似文献   

16.
Pybus OG  Rambaut A  Harvey PH 《Genetics》2000,155(3):1429-1437
We describe a unified set of methods for the inference of demographic history using genealogies reconstructed from gene sequence data. We introduce the skyline plot, a graphical, nonparametric estimate of demographic history. We discuss both maximum-likelihood parameter estimation and demographic hypothesis testing. Simulations are carried out to investigate the statistical properties of maximum-likelihood estimates of demographic parameters. The simulations reveal that (i) the performance of exponential growth model estimates is determined by a simple function of the true parameter values and (ii) under some conditions, estimates from reconstructed trees perform as well as estimates from perfect trees. We apply our methods to HIV-1 sequence data and find strong evidence that subtypes A and B have different demographic histories. We also provide the first (albeit tentative) genetic evidence for a recent decrease in the growth rate of subtype B.  相似文献   

17.
We have investigated simulation-based techniques for parameter estimation in chaotic intercellular networks. The proposed methodology combines a synchronization–based framework for parameter estimation in coupled chaotic systems with some state–of–the–art computational inference methods borrowed from the field of computational statistics. The first method is a stochastic optimization algorithm, known as accelerated random search method, and the other two techniques are based on approximate Bayesian computation. The latter is a general methodology for non–parametric inference that can be applied to practically any system of interest. The first method based on approximate Bayesian computation is a Markov Chain Monte Carlo scheme that generates a series of random parameter realizations for which a low synchronization error is guaranteed. We show that accurate parameter estimates can be obtained by averaging over these realizations. The second ABC–based technique is a Sequential Monte Carlo scheme. The algorithm generates a sequence of “populations”, i.e., sets of randomly generated parameter values, where the members of a certain population attain a synchronization error that is lesser than the error attained by members of the previous population. Again, we show that accurate estimates can be obtained by averaging over the parameter values in the last population of the sequence. We have analysed how effective these methods are from a computational perspective. For the numerical simulations we have considered a network that consists of two modified repressilators with identical parameters, coupled by the fast diffusion of the autoinducer across the cell membranes.  相似文献   

18.
19.
Basic summary statistics that quantify the population genetic structure of influenza virus are important for understanding and inferring the evolutionary and epidemiological processes. However, the sampling dates of global virus sequences in the last several decades are scattered nonuniformly throughout the calendar. Such temporal structure of samples and the small effective size of viral population hampers the use of conventional methods to calculate summary statistics. Here, we define statistics that overcome this problem by correcting for the sampling-time difference in quantifying a pairwise sequence difference. A simple linear regression method jointly estimates the mutation rate and the level of sequence polymorphism, thus providing an estimate of the effective population size. It also leads to the definition of Wright’s FST for arbitrary time-series data. Furthermore, as an alternative to Tajima’s D statistic or the site-frequency spectrum, a mismatch distribution corrected for sampling-time differences can be obtained and compared between actual and simulated data. Application of these methods to seasonal influenza A/H3N2 viruses sampled between 1980 and 2017 and sequences simulated under the model of recurrent positive selection with metapopulation dynamics allowed us to estimate the synonymous mutation rate and find parameter values for selection and demographic structure that fit the observation. We found that the mutation rates of HA and PB1 segments before 2007 were particularly high and that including recurrent positive selection in our model was essential for the genealogical structure of the HA segment. Methods developed here can be generally applied to population genetic inferences using serially sampled genetic data.  相似文献   

20.
Population variability and uncertainty are important features of biological systems that must be considered when developing mathematical models for these systems. In this paper we present probability-based parameter estimation methods that account for such variability and uncertainty. Theoretical results that establish well-posedness and stability for these methods are discussed. A probabilistic parameter estimation technique is then applied to a toxicokinetic model for trichloroethylene using several types of simulated data. Comparison with results obtained using a standard, deterministic parameter estimation method suggests that the probabilistic methods are better able to capture population variability and uncertainty in model parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号