首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
How best to summarize large and complex datasets is a problem that arises in many areas of science. We approach it from the point of view of seeking data summaries that minimize the average squared error of the posterior distribution for a parameter of interest under approximate Bayesian computation (ABC). In ABC, simulation under the model replaces computation of the likelihood, which is convenient for many complex models. Simulated and observed datasets are usually compared using summary statistics, typically in practice chosen on the basis of the investigator's intuition and established practice in the field. We propose two algorithms for automated choice of efficient data summaries. Firstly, we motivate minimisation of the estimated entropy of the posterior approximation as a heuristic for the selection of summary statistics. Secondly, we propose a two-stage procedure: the minimum-entropy algorithm is used to identify simulated datasets close to that observed, and these are each successively regarded as observed datasets for which the mean root integrated squared error of the ABC posterior approximation is minimized over sets of summary statistics. In a simulation study, we both singly and jointly inferred the scaled mutation and recombination parameters from a population sample of DNA sequences. The computationally-fast minimum entropy algorithm showed a modest improvement over existing methods while our two-stage procedure showed substantial and highly-significant further improvement for both univariate and bivariate inferences. We found that the optimal set of summary statistics was highly dataset specific, suggesting that more generally there may be no globally-optimal choice, which argues for a new selection for each dataset even if the model and target of inference are unchanged.  相似文献   

2.
With the availability of whole-genome sequence data biologists are able to test hypotheses regarding the demography of populations. Furthermore, the advancement of the Approximate Bayesian Computation (ABC) methodology allows the demographic inference to be performed in a simple framework using summary statistics. We present here msABC, a coalescent-based software that facilitates the simulation of multi-locus data, suitable for an ABC analysis. msABC is based on Hudson's ms algorithm, which is used extensively for simulating neutral demographic histories of populations. The flexibility of the original algorithm has been extended so that sample size may vary among loci, missing data can be incorporated in simulations and calculations, and a multitude of summary statistics for single or multiple populations is generated. The source code of msABC is available at http://bio.lmu.de/~pavlidis/msabc or upon request from the authors.  相似文献   

3.
We have investigated simulation-based techniques for parameter estimation in chaotic intercellular networks. The proposed methodology combines a synchronization–based framework for parameter estimation in coupled chaotic systems with some state–of–the–art computational inference methods borrowed from the field of computational statistics. The first method is a stochastic optimization algorithm, known as accelerated random search method, and the other two techniques are based on approximate Bayesian computation. The latter is a general methodology for non–parametric inference that can be applied to practically any system of interest. The first method based on approximate Bayesian computation is a Markov Chain Monte Carlo scheme that generates a series of random parameter realizations for which a low synchronization error is guaranteed. We show that accurate parameter estimates can be obtained by averaging over these realizations. The second ABC–based technique is a Sequential Monte Carlo scheme. The algorithm generates a sequence of “populations”, i.e., sets of randomly generated parameter values, where the members of a certain population attain a synchronization error that is lesser than the error attained by members of the previous population. Again, we show that accurate estimates can be obtained by averaging over the parameter values in the last population of the sequence. We have analysed how effective these methods are from a computational perspective. For the numerical simulations we have considered a network that consists of two modified repressilators with identical parameters, coupled by the fast diffusion of the autoinducer across the cell membranes.  相似文献   

4.
Wu CH  Drummond AJ 《Genetics》2011,188(1):151-164
We provide a framework for Bayesian coalescent inference from microsatellite data that enables inference of population history parameters averaged over microsatellite mutation models. To achieve this we first implemented a rich family of microsatellite mutation models and related components in the software package BEAST. BEAST is a powerful tool that performs Bayesian MCMC analysis on molecular data to make coalescent and evolutionary inferences. Our implementation permits the application of existing nonparametric methods to microsatellite data. The implemented microsatellite models are based on the replication slippage mechanism and focus on three properties of microsatellite mutation: length dependency of mutation rate, mutational bias toward expansion or contraction, and number of repeat units changed in a single mutation event. We develop a new model that facilitates microsatellite model averaging and Bayesian model selection by transdimensional MCMC. With Bayesian model averaging, the posterior distributions of population history parameters are integrated across a set of microsatellite models and thus account for model uncertainty. Simulated data are used to evaluate our method in terms of accuracy and precision of estimation and also identification of the true mutation model. Finally we apply our method to a red colobus monkey data set as an example.  相似文献   

5.
We develop a statistical method to infer the parameters of Hubbell's neutral model of biodiversity using data on local species abundances and their phylogenetic relatedness. This method uses the approximate Bayesian computation (ABC) approach, where the data are summarized into a small number of informative summary statistics. We used three statistics: the number of species in the sample, Shannon H index of evenness and Shao and Sokal's B 1 index of phylogenetic tree imbalance. Our approach was found to outperform previous methods, illustrating the potential of ABC methods in ecology. Applying it to four large tropical forest tree data sets, the best-fit immigration rates m were found to be two orders of magnitude smaller and regional diversities θ larger than previously reported for the same data. This implies that neutral-compatible regional pools of tropical trees should extend over continental scales, and that m measures, in this context, mostly the frequency of long-distance dispersal events.  相似文献   

6.
Currently used techniques for the analysis of single-molecule trajectories only exploit a small part of the available information stored in the data. Here, we apply a Bayesian inference scheme to trajectories of confined receptors that are targeted by pore-forming toxins to extract the two-dimensional confining potential that restricts the motion of the receptor. The receptor motion is modeled by the overdamped Langevin equation of motion. The method uses most of the information stored in the trajectory and converges quickly onto inferred values, while providing the uncertainty on the determined values. The inference is performed on the polynomial development of the potential and on the diffusivities that have been discretized on a mesh. Numerical simulations are used to test the scheme and quantify the convergence toward the input values for forces, potential, and diffusivity. Furthermore, we show that the technique outperforms the classical mean-square-displacement technique when forces act on confined molecules because the typical mean-square-displacement analysis does not account for them. We also show that the inferred potential better represents input potentials than the potential extracted from the position distribution based on Boltzmann statistics that assumes statistical equilibrium.  相似文献   

7.
Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.  相似文献   

8.
Improving the realism of spatially explicit demographic models is important for better inferring the history of past populations and for understanding the genetic bases of adaptation and speciation. One particular type of demographic event to take into account is long-distance dispersal (LDD). The goals of this study are to explore the impact of various levels of LDD on genetic diversity and to show to what extent LDD levels can be correctly inferred from multilocus data sets using an approximate Bayesian computation approach. We therefore incorporated LDD into a 2D stepping stone forward simulation framework coupled to a coalescent backward simulation step to generate genetic diversity at 100 microsatellite markers under various demographic conditions relevant to recent human evolution. Our results confirm that LDD considerably increases genetic diversity within demes and decreases levels of diversity between demes. By controlling the spatial occurrence of LDD, it appears that LDD events occurring during a phase of range expansion into new territories are more important in maintaining genetic diversity than those occurring in the wake of the expansion or when colonization is over. We also show that it is possible to infer whether LDD has occurred during a range expansion, but our results suggest that one can only approximately estimate the extent of LDD based on genetic summary statistics.  相似文献   

9.
As the field of phylogeography has continued to move in the model‐based direction, researchers continue struggling to construct useful models for inference. These models must be both simple enough to be tractable yet contain enough of the complexity of the natural world to make meaningful inference. Beyond constructing such models for inference, researchers explore model space and test competing models with the data on hand, with the goal of improving the understanding of the natural world and the processes underlying natural biological communities. Approximate Bayesian computation (ABC) has increased in recent popularity as a tool for evaluating alternative historical demographic models given population genetic samples. As a thorough demonstration, Pelletier & Carstens ( 2014 ) use ABC to test 143 phylogeographic submodels given geographically widespread genetic samples from the salamander species Plethodon idahoensis (Carstens et al. 2004 ) and, in so doing, demonstrate how the results of the ABC model choice procedure are dependent on the model set one chooses to evaluate.  相似文献   

10.
Comparison of demo‐genetic models using Approximate Bayesian Computation (ABC) is an active research field. Although large numbers of populations and models (i.e. scenarios) can be analysed with ABC using molecular data obtained from various marker types, methodological and computational issues arise when these numbers become too large. Moreover, Robert et al. (Proceedings of the National Academy of Sciences of the United States of America, 2011, 108, 15112) have shown that the conclusions drawn on ABC model comparison cannot be trusted per se and required additional simulation analyses. Monte Carlo inferential techniques to empirically evaluate confidence in scenario choice are very time‐consuming, however, when the numbers of summary statistics (Ss) and scenarios are large. We here describe a methodological innovation to process efficient ABC scenario probability computation using linear discriminant analysis (LDA) on Ss before computing logistic regression. We used simulated pseudo‐observed data sets (pods) to assess the main features of the method (precision and computation time) in comparison with traditional probability estimation using raw (i.e. not LDA transformed) Ss. We also illustrate the method on real microsatellite data sets produced to make inferences about the invasion routes of the coccinelid Harmonia axyridis. We found that scenario probabilities computed from LDA‐transformed and raw Ss were strongly correlated. Type I and II errors were similar for both methods. The faster probability computation that we observed (speed gain around a factor of 100 for LDA‐transformed Ss) substantially increases the ability of ABC practitioners to analyse large numbers of pods and hence provides a manageable way to empirically evaluate the power available to discriminate among a large set of complex scenarios.  相似文献   

11.
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.  相似文献   

12.
Disease incidence or mortality data are typically available as rates or counts for specified regions, collected over time. We propose Bayesian nonparametric spatial modeling approaches to analyze such data. We develop a hierarchical specification using spatial random effects modeled with a Dirichlet process prior. The Dirichlet process is centered around a multivariate normal distribution. This latter distribution arises from a log-Gaussian process model that provides a latent incidence rate surface, followed by block averaging to the areal units determined by the regions in the study. With regard to the resulting posterior predictive inference, the modeling approach is shown to be equivalent to an approach based on block averaging of a spatial Dirichlet process to obtain a prior probability model for the finite dimensional distribution of the spatial random effects. We introduce a dynamic formulation for the spatial random effects to extend the model to spatio-temporal settings. Posterior inference is implemented through Gibbs sampling. We illustrate the methodology with simulated data as well as with a data set on lung cancer incidences for all 88 counties in the state of Ohio over an observation period of 21 years.  相似文献   

13.
Exact inference for growth curves with intraclass correlation structure   总被引:2,自引:0,他引:2  
Weerahandi S  Berger VW 《Biometrics》1999,55(3):921-924
We consider repeated observations taken over time for each of several subjects. For example, one might consider the growth curve of a cohort of babies over time. We assume a simple linear growth curve model. Exact results based on sufficient statistics (exact tests of the null hypothesis that a coefficient is zero, or exact confidence intervals for coefficients) are not available to make inference on regression coefficients when an intraclass correlation structure is assumed. This paper will demonstrate that such exact inference is possible using generalized inference.  相似文献   

14.
In recent years approximate Bayesian computation (ABC) methods have become popular in population genetics as an alternative to full-likelihood methods to make inferences under complex demographic models. Most ABC methods rely on the choice of a set of summary statistics to extract information from the data. In this article we tested the use of the full allelic distribution directly in an ABC framework. Although the ABC techniques are becoming more widely used, there is still uncertainty over how they perform in comparison with full-likelihood methods. We thus conducted a simulation study and provide a detailed examination of ABC in comparison with full likelihood in the case of a model of admixture. This model assumes that two parental populations mixed at a certain time in the past, creating a hybrid population, and that the three populations then evolve under pure drift. Several aspects of ABC methodology were investigated, such as the effect of the distance metric chosen to measure the similarity between simulated and observed data sets. Results show that in general ABC provides good approximations to the posterior distributions obtained with the full-likelihood method. This suggests that it is possible to apply ABC using allele frequencies to make inferences in cases where it is difficult to select a set of suitable summary statistics and when the complexity of the model or the size of the data set makes it computationally prohibitive to use full-likelihood methods.  相似文献   

15.
A variety of inverse kinematics (IK) algorithms exist for estimating postures and displacements from a set of noisy marker positions, typically aiming to minimize IK errors by distributing errors amongst all markers in a least-squares (LS) sense. This paper describes how Bayesian inference can contrastingly be used to maximize the probability that a given stochastic kinematic model would produce the observed marker positions. We developed Bayesian IK for two planar IK applications: (1) kinematic chain posture estimates using an explicit forward kinematics model, and (2) rigid body rotation estimates using implicit kinematic modeling through marker displacements. We then tested and compared Bayesian IK results to LS results in Monte Carlo simulations in which random marker error was introduced using Gaussian noise amplitudes ranging uniformly between 0.2 mm and 2.0 mm. Results showed that Bayesian IK was more accurate than LS-IK in over 92% of simulations, with the exception of one center-of-rotation coordinate planar rotation, for which Bayesian IK was more accurate in only 68% of simulations. Moreover, while LS errors increased with marker noise, Bayesian errors were comparatively unaffected by noise amplitude. Nevertheless, whereas the LS solutions required average computational durations of less than 0.5 s, average Bayesian IK durations ranged from 11.6 s for planar rotation to over 2000 s for kinematic chain postures. These results suggest that Bayesian IK can yield order-of-magnitude IK improvements for simple planar IK, but also that its computational demands may make it impractical for some applications.  相似文献   

16.

Background  

The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations.  相似文献   

17.
Incomplete lineage sorting has been documented across a diverse set of taxa ranging from song birds to conifers. Such patterns are expected theoretically for species characterized by certain life history characteristics (e.g. long generation times) and those influenced by certain historical demographic events (e.g. recent divergences). A number of methods to estimate the underlying species phylogeny from a set of gene trees have been proposed and shown to be effective when incomplete lineage sorting has occurred. The further effects of gene flow on those methods, however, remain to be investigated. Here, we focus on the performance of three methods of species tree inference, ESP-COAL, minimizing deep coalescence (MDC), and concatenation, when incomplete lineage sorting and gene flow jointly confound the relationship between gene and species trees. Performance was investigated using Monte Carlo coalescent simulations under four models (n-island, stepping stone, parapatric, and allopatric) and three magnitudes of gene flow (Nem = 0.01, 0.10, 1.00). Although results varied by the model and magnitude of gene flow, methods incorporating aspects of the coalescent process (ESP-COAL and MDC) performed well, with probabilities of identifying the correct species tree topology typically increasing to greater than 0.75 when five more loci are sampled. The only exceptions to that pattern included gene flow at moderate to high magnitudes under the n-island and stepping stone models. Concatenation performs poorly relative to the other methods. We extend these results to a discussion of the importance of species and population phylogenies to the fields of molecular systematics and phylogeography using an empirical example from Rhododendron.  相似文献   

18.
The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had far-reaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum mean-square error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and root-mean-square (RMS) are given. The accuracy of both Bayesian CoD estimators with non-informative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using gene-expression data from a previously published study on metastatic melanoma.  相似文献   

19.
Estimating pairwise correlation from replicated genome-scale (a.k.a. OMICS) data is fundamental to cluster functionally relevant biomolecules to a cellular pathway. The popular Pearson correlation coefficient estimates bivariate correlation by averaging over replicates. It is not completely satisfactory since it introduces strong bias while reducing variance. We propose a new multivariate correlation estimator that models all replicates as independent and identically distributed (i.i.d.) samples from the multivariate normal distribution. We derive the estimator by maximizing the likelihood function. For small sample data, we provide a resampling-based statistical inference procedure, and for moderate to large sample data, we provide an asymptotic statistical inference procedure based on the Likelihood Ratio Test (LRT). We demonstrate advantages of the new multivariate correlation estimator over Pearson bivariate correlation estimator using simulations and real-world data analysis examples. AVAILABILITY: The estimator and statistical inference procedures have been implemented in an R package 'CORREP' that is available from CRAN [http://cran.r-project.org] and Bioconductor [http://www.bioconductor.org/]. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.  相似文献   

20.
Complex self-motion stimulations in the dark can be powerfully disorienting and can create illusory motion percepts. In the absence of visual cues, the brain has to use angular and linear acceleration information provided by the vestibular canals and the otoliths, respectively. However, these sensors are inaccurate and ambiguous. We propose that the brain processes these signals in a statistically optimal fashion, reproducing the rules of Bayesian inference. We also suggest that this processing is related to the statistics of natural head movements. This would create a perceptual bias in favour of low velocity and acceleration. We have constructed a Bayesian model of self-motion perception based on these assumptions. Using this model, we have simulated perceptual responses to centrifugation and off-vertical axis rotation and obtained close agreement with experimental findings. This demonstrates how Bayesian inference allows to make a quantitative link between sensor noise and ambiguities, statistics of head movement, and the perception of self-motion.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号