期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Parameter Estimation and Model Selection in Computational Biology

Gabriele Lillacci Mustafa Khammash 《PLoS computational biology》2010,6(3)

A central challenge in computational modeling of biological systems is the determination of the model parameters. Typically, only a fraction of the parameters (such as kinetic rate constants) are experimentally measured, while the rest are often fitted. The fitting process is usually based on experimental time course measurements of observables, which are used to assign parameter values that minimize some measure of the error between these measurements and the corresponding model prediction. The measurements, which can come from immunoblotting assays, fluorescent markers, etc., tend to be very noisy and taken at a limited number of time points. In this work we present a new approach to the problem of parameter selection of biological models. We show how one can use a dynamic recursive estimator, known as extended Kalman filter, to arrive at estimates of the model parameters. The proposed method follows. First, we use a variation of the Kalman filter that is particularly well suited to biological applications to obtain a first guess for the unknown parameters. Secondly, we employ an a posteriori identifiability test to check the reliability of the estimates. Finally, we solve an optimization problem to refine the first guess in case it should not be accurate enough. The final estimates are guaranteed to be statistically consistent with the measurements. Furthermore, we show how the same tools can be used to discriminate among alternate models of the same biological process. We demonstrate these ideas by applying our methods to two examples, namely a model of the heat shock response in E. coli, and a model of a synthetic gene regulation system. The methods presented are quite general and may be applied to a wide class of biological systems where noisy measurements are used for parameter estimation or model selection. 相似文献

2.

Strong Inference for Systems Biology

Daniel A. Beard Martin J. Kushmerick 《PLoS computational biology》2009,5(8)

相似文献

3.

Model Selection in Systems Biology Depends on Experimental Design

Daniel Silk Paul D. W. Kirk Chris P. Barnes Tina Toni Michael P. H. Stumpf 《PLoS computational biology》2014,10(6)

Experimental design attempts to maximise the information available for modelling tasks. An optimal experiment allows the inferred models or parameters to be chosen with the highest expected degree of confidence. If the true system is faithfully reproduced by one of the models, the merit of this approach is clear - we simply wish to identify it and the true parameters with the most certainty. However, in the more realistic situation where all models are incorrect or incomplete, the interpretation of model selection outcomes and the role of experimental design needs to be examined more carefully. Using a novel experimental design and model selection framework for stochastic state-space models, we perform high-throughput in-silico analyses on families of gene regulatory cascade models, to show that the selected model can depend on the experiment performed. We observe that experimental design thus makes confidence a criterion for model choice, but that this does not necessarily correlate with a model''s predictive power or correctness. Finally, in the special case of linear ordinary differential equation (ODE) models, we explore how wrong a model has to be before it influences the conclusions of a model selection analysis. 相似文献

4.

Multi-scale Inference of Interaction Rules in Animal Groups Using Bayesian Model Selection

Richard P. Mann Andrea Perna Daniel Str?mbom Roman Garnett James E. Herbert-Read David J. T. Sumpter Ashley J. W. Ward 《PLoS computational biology》2013,9(3)

Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis). We show that these exhibit a stereotypical ‘phase transition’, whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have ‘memory’ of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture the observed locality of interactions. Traditional self-propelled particle models fail to capture the fine scale dynamics of the system. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics, while maintaining a biologically plausible perceptual range. We conclude that prawns’ movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects. 相似文献

5.

A Primer on Bayesian Inference for Biophysical Systems

Keegan?E. Hines 《Biophysical journal》2015,108(9):2103-2113

Bayesian inference is a powerful statistical paradigm that has gained popularity in many fields of science, but adoption has been somewhat slower in biophysics. Here, I provide an accessible tutorial on the use of Bayesian methods by focusing on example applications that will be familiar to biophysicists. I first discuss the goals of Bayesian inference and show simple examples of posterior inference using conjugate priors. I then describe Markov chain Monte Carlo sampling and, in particular, discuss Gibbs sampling and Metropolis random walk algorithms with reference to detailed examples. These Bayesian methods (with the aid of Markov chain Monte Carlo sampling) provide a generalizable way of rigorously addressing parameter inference and identifiability for arbitrarily complicated models. 相似文献

6.

Bayesian Inference for Generalized Linear Mixed Model Based on the Multivariate t Distribution in Population Pharmacokinetic Study

Fang-Rong Yan Yuan Huang Jun-Lin Liu Tao Lu Jin-Guan Lin 《PloS one》2013,8(3)

This article provides a fully Bayesian approach for modeling of single-dose and complete pharmacokinetic data in a population pharmacokinetic (PK) model. To overcome the impact of outliers and the difficulty of computation, a generalized linear model is chosen with the hypothesis that the errors follow a multivariate Student t distribution which is a heavy-tailed distribution. The aim of this study is to investigate and implement the performance of the multivariate t distribution to analyze population pharmacokinetic data. Bayesian predictive inferences and the Metropolis-Hastings algorithm schemes are used to process the intractable posterior integration. The precision and accuracy of the proposed model are illustrated by the simulating data and a real example of theophylline data. 相似文献

7.

Population Genetics Inference for Longitudinally-Sampled Mutants Under Strong Selection

Miguel Lacerda Cathal Seoighe 《Genetics》2014,198(3):1237-1250

Longitudinal allele frequency data are becoming increasingly prevalent. Such samples permit statistical inference of the population genetics parameters that influence the fate of mutant variants. To infer these parameters by maximum likelihood, the mutant frequency is often assumed to evolve according to the Wright–Fisher model. For computational reasons, this discrete model is commonly approximated by a diffusion process that requires the assumption that the forces of natural selection and mutation are weak. This assumption is not always appropriate. For example, mutations that impart drug resistance in pathogens may evolve under strong selective pressure. Here, we present an alternative approximation to the mutant-frequency distribution that does not make any assumptions about the magnitude of selection or mutation and is much more computationally efficient than the standard diffusion approximation. Simulation studies are used to compare the performance of our method to that of the Wright–Fisher and Gaussian diffusion approximations. For large populations, our method is found to provide a much better approximation to the mutant-frequency distribution when selection is strong, while all three methods perform comparably when selection is weak. Importantly, maximum-likelihood estimates of the selection coefficient are severely attenuated when selection is strong under the two diffusion models, but not when our method is used. This is further demonstrated with an application to mutant-frequency data from an experimental study of bacteriophage evolution. We therefore recommend our method for estimating the selection coefficient when the effective population size is too large to utilize the discrete Wright–Fisher model. 相似文献

8.

Summer School on Nonlinear Systems in Evolutionary and Population Biology

《Bulletin of mathematical biology》1993,55(3):693-693

相似文献

9.

Inference in Bayesian networks

Needham CJ Bradford JR Bulpitt AJ Westhead DR 《Nature biotechnology》2006,24(1):51-53

相似文献

10.

Bayesian Computation and Model Selection Without Likelihoods

Christoph Leuenberger Daniel Wegmann 《Genetics》2010,184(1):243-252

Until recently, the use of Bayesian inference was limited to a few cases because for many realistic probability models the likelihood function cannot be calculated analytically. The situation changed with the advent of likelihood-free inference algorithms, often subsumed under the term approximate Bayesian computation (ABC). A key innovation was the use of a postsampling regression adjustment, allowing larger tolerance values and as such shifting computation time to realistic orders of magnitude. Here we propose a reformulation of the regression adjustment in terms of a general linear model (GLM). This allows the integration into the sound theoretical framework of Bayesian statistics and the use of its methods, including model selection via Bayes factors. We then apply the proposed methodology to the question of population subdivision among western chimpanzees, Pan troglodytes verus.WITH the advent of ever more powerful computers and the refinement of algorithms like MCMC or Gibbs sampling, Bayesian statistics have become an important tool for scientific inference during the past two decades. Consider a model M creating data D (DNA sequence data, for example) determined by parameters from some (bounded) parameter space Π ⊂ R^m whose joint prior density we denote by . The quantity of interest is the posterior distribution of the parameters, which can be calculated by Bayes rule aswhere is the likelihood of the data and is a normalizing constant. Direct use of this formula, however, is often prevented by the fact that the likelihood function cannot be calculated analytically for many realistic probability models. In these cases one is obliged to use stochastic simulation. propose a rejection sampling method for simulating a posterior random sample where the full data D are replaced by a summary statistic s (like the number of segregating sites in their setting). Even if the statistic does not capture the full information contained in the data D, rejection sampling allows for the simulation of approximate posterior distributions of the parameters in question (the scaled mutation rate in their model). This approach was extended to multiple-parameter models with multivariate summary statistics by . In their setting a candidate vector of parameters is simulated from a prior distribution and is accepted if its corresponding vector of summary statistics is sufficiently close to the observed summary statistics s_obs with respect to some metric in the space of s, i.e., if dist(s, s_obs) < ε for a fixed tolerance ε. We suppose that the likelihood of the full model is continuous and nonzero around s_obs. In practice the summary statistics are often discrete but the range of values is large enough to be approximated by real numbers. The likelihood of the truncated model obtained by this acceptance–rejection process is given by(1)where is the ε-ball in the space of summary statistics and Ind(·) is the indicator function. Observe that degenerates to a (Dirac) point measure centered at s_obs as . If the parameters are generated from a prior , then the distribution of the parameters retained after the rejection process outlined above is given by(2)We call this density the truncated prior. Combining (1) and (2) we get(3)Thus the posterior distribution of the parameters under the model M for s = s_obs given the prior is exactly equal to the posterior distribution under the truncated model given the truncated prior . If we can estimate the truncated prior and make an educated guess for a parametric statistical model of M_ε(s_obs), we arrive at a reasonable approximation of the posterior even if the likelihood of the full model M is unknown. It is to be expected that due to the localization process the truncated model will exhibit a simpler structure than the full model M and thus be easier to estimate.Estimating is straightforward, at least when the summary statistics can be sampled from M in a reasonable amount of time: Sample the parameters from the prior , create their respective statistics s from M, and save those parameters whose statistics lie in in a list . The empirical distribution of these retained parameters yields an estimate of . If the tolerance ε is small, then one can assume that is close to some (unknown) constant over the whole range of . Under that assumption, Equation 3 shows that . However, when the dimension n of summary statistics is high (and for more complex models dimensions like n = 50 are not unusual), the “curse of dimensionality” implies that the tolerance must be chosen rather large or else the acceptance rate becomes prohibitively low. This, however, distorts the precision of the approximation of the posterior distribution by the truncated prior (see ). This situation can be partially alleviated by speeding up the sampling process; such methods are subsumed under the term approximate Bayesian computation (ABC). develop a variant of the classical Metropolis–Hastings algorithm (termed ABC–MCMC in ), which allows them to sample directly from the truncated prior . In a sequential Monte Carlo sampler is proposed, requiring substantially less iterations than ABC–MCMC. But even when such methods are applied, the assumption that is constant over the ε-ball is a very rough one, indeed.To take into account the variation of within the ε-ball, a postsampling regression adjustment (termed ABC-REG in the following) of the sample P of retained parameters is introduced in the important article by . Basically, they postulate a (locally) linear dependence between the parameters and their associated summary statistics s. More precisely, the (local) model they implicitly assume is of the form , where M is a matrix of regression coefficients, m₀ a constant vector, and a random vector of zero mean. Computer simulations suggest that for many population models ABC–REG yields posterior marginal densities that have narrower highest posterior density (HPD) regions and are more closely centered around the true parameter values than the empirical posterior densities directly produced by ABC samplers (). An attractive feature of ABC–REG is that the posterior adjustment is performed directly on the simulated parameters, which makes estimation of the marginal posteriors of individual parameters particularly easy. The method can also be extended to more complex, nonlinear models as demonstrated, e.g., in Blum and Francois (2009). In extreme situations, however, ABC–REG may yield posteriors that are nonzero in parameter regions where the priors actually vanish (see Figure 1B for an illustration of this phenomenon). Moreover, it is not clear how ABC–REG could yield an estimate of the marginal density of model M at s_obs, information that is useful for model comparison.Open in a separate window Figure 1.—Comparison of rejection (A and D), ABC–REG (B and E), and ABC–GLM (C and F) posteriors with those obtained from analytical likelihood calculations. We estimated the population–mutation parameter θ = 4Nμ of a panmictic population for different observed numbers of segregating sites (see text). Shades indicate the L₁ distance between the inferred and the analytically calculated posterior. White corresponds to an exact match (zero distance) and darker gray shades indicate larger distances. If the inferred posterior differs from the analytical more than the prior does, squares are marked in black. The top row (A–C) corresponds to cases with a uniform prior θ ∼ Unif([0.005, 10]) and the bottom row (D–F) to cases with a discontinuous prior with “gap.” The tolerance ε is given as the absolute distance in number of segregating sites. Shown are averages over 25 independent estimations. To have a fair comparison, we adjusted the smoothing parameters (bandwidths) to get the best results for all approaches.In contrast to ABC–REG we treat the parameters as exogenous and the summary statistics s as endogenous variables and we stipulate for a general linear model (GLM in the literature—not to be confused with the generalized linear models that unfortunately share the same abbreviation). To be precise, we assume the summary statistics s created by the truncated model''s likelihood to satisfy(4)where C is a n × m matrix of constants, c₀ an n × 1 vector, and a random vector with a multivariate normal distribution of zero mean and covariance matrix :A GLM has the advantage of taking into account not only the (local) linearity, but also the strong correlation normally present between the components of the summary statistics. Of course, the model assumption (4) can never represent the full truth since its statistics are in principle unbounded whereas the likelihood is supported on the ε-ball around s_obs. But since the multivariate Gaussians will fall off rapidly in practice and not reach far out off the boundary of , this is a disadvantage we can live with. In particular, the ordinary least squares (OLS) estimate outlined below implies that for the constant c₀ tends to s_obs whereas the design matrix C and the covariance matrix both vanish. This means that in the limit of zero tolerance ε = 0 our model assumption yields the true posterior distribution of M. 相似文献

11.

Bayesian Model Selection and Statistical Modeling by ANDO,T.

Lawrence Pettit 《Biometrics》2012,68(3):997-998

相似文献

12.

Quantifying the Community: Using Bayesian Learning Networks to find Structure and Conduct Inference in Invasions Biology

Sean?M.?McMahon Email author 《Biological invasions》2005,7(5):833-844

One of the key obstacles to better understanding, anticipating, and managing biological invasions is the difficulty researchers face when trying to quantify the many important aspects of the communities that affect and are affected by non-indigenous species (NIS). Bayesian Learning Networks (BLNs) combine graphical models with multivariate Bayesian statistics to provide an analytical tool for the quantification of communities. BLNs can determine which components of a natural system influence which others, quantify this influence, and provide inferential analysis of parameter changes when changes in network variables are hypothesized or observed. After a brief explanation of these three functions of BLNs, a simulated network is analyzed for structure, parameter estimation, and inference. Discussion of this approach to invasions biology is explored and expanded applications for BLNs are then offered. 相似文献

13.

Bayesian Parameter Estimation and Segmentation in the Multi-Atlas Random Orbit Model

Xiaoying Tang Kenichi Oishi Andreia V. Faria Argye E. Hillis Marilyn S. Albert Susumu Mori Michael I. Miller 《PloS one》2013,8(6)

This paper examines the multiple atlas random diffeomorphic orbit model in Computational Anatomy (CA) for parameter estimation and segmentation of subcortical and ventricular neuroanatomy in magnetic resonance imagery. We assume that there exist multiple magnetic resonance image (MRI) atlases, each atlas containing a collection of locally-defined charts in the brain generated via manual delineation of the structures of interest. We focus on maximum a posteriori estimation of high dimensional segmentations of MR within the class of generative models representing the observed MRI as a conditionally Gaussian random field, conditioned on the atlas charts and the diffeomorphic change of coordinates of each chart that generates it. The charts and their diffeomorphic correspondences are unknown and viewed as latent or hidden variables. We demonstrate that the expectation-maximization (EM) algorithm arises naturally, yielding the likelihood-fusion equation which the a posteriori estimator of the segmentation labels maximizes. The likelihoods being fused are modeled as conditionally Gaussian random fields with mean fields a function of each atlas chart under its diffeomorphic change of coordinates onto the target. The conditional-mean in the EM algorithm specifies the convex weights with which the chart-specific likelihoods are fused. The multiple atlases with the associated convex weights imply that the posterior distribution is a multi-modal representation of the measured MRI. Segmentation results for subcortical and ventricular structures of subjects, within populations of demented subjects, are demonstrated, including the use of multiple atlases across multiple diseased groups. 相似文献

14.

Joint Bayesian Inference Reveals Model Properties Shared between Multiple Experimental Conditions

Hannah M. H. Dold Ingo Fründ 《PloS one》2014,9(4)

相似文献

15.

Model Selection in Historical Research Using Approximate Bayesian Computation

Xavier Rubio-Campillo 《PloS one》2016,11(1)

Formal Models and History

Computational models are increasingly being used to study historical dynamics. This new trend, which could be named Model-Based History, makes use of recently published datasets and innovative quantitative methods to improve our understanding of past societies based on their written sources. The extensive use of formal models allows historians to re-evaluate hypotheses formulated decades ago and still subject to debate due to the lack of an adequate quantitative framework. The initiative has the potential to transform the discipline if it solves the challenges posed by the study of historical dynamics. These difficulties are based on the complexities of modelling social interaction, and the methodological issues raised by the evaluation of formal models against data with low sample size, high variance and strong fragmentation.

Case Study

This work examines an alternate approach to this evaluation based on a Bayesian-inspired model selection method. The validity of the classical Lanchester’s laws of combat is examined against a dataset comprising over a thousand battles spanning 300 years. Four variations of the basic equations are discussed, including the three most common formulations (linear, squared, and logarithmic) and a new variant introducing fatigue. Approximate Bayesian Computation is then used to infer both parameter values and model selection via Bayes Factors.

Impact

Results indicate decisive evidence favouring the new fatigue model. The interpretation of both parameter estimations and model selection provides new insights into the factors guiding the evolution of warfare. At a methodological level, the case study shows how model selection methods can be used to guide historical research through the comparison between existing hypotheses and empirical evidence. 相似文献

16.

The Importance of Emerging Model Systems in Plant Biology

Dina F. Mandoli Richard Olmstead 《Journal of Plant Growth Regulation》2000,19(3):249-252

apd: 9 March 2001 相似文献

17.

Yeast Systems Biology: Model Organism and Cell Factory

Jens Nielsen 《Biotechnology journal》2019,14(9)

For thousands of years, the yeast Saccharomyces cerevisiae (S. cerevisiae) has served as a cell factory for the production of bread, beer, and wine. In more recent years, this yeast has also served as a cell factory for producing many different fuels, chemicals, food ingredients, and pharmaceuticals. S. cerevisiae, however, has also served as a very important model organism for studying eukaryal biology, and even today many new discoveries, important for the treatment of human diseases, are made using this yeast as a model organism. Here a brief review of the use of S. cerevisiae as a model organism for studying eukaryal biology, its use as a cell factory, and how advances in systems biology underpin developments in both these areas, is provided. 相似文献

18.

Inference from Clines Stabilized by Frequency-Dependent Selection

下载免费PDF全文

J. Mallet N. Barton 《Genetics》1989,122(4):967-976

Frequency-dependent selection against rare forms can maintain clines. For weak selection, s, in simple linear models of frequency-dependence, single locus clines are stabilized with a maximum slope of between square root of s/square root of 8 sigma and square root of s/square root of 12 delta, where sigma is the dispersal distance. These clines are similar to those maintained by heterozygote disadvantage. Using computer simulations, the weak-selection analytical results are extended to higher selection pressures with up to three unlinked genes. Graphs are used to display the effect of selection, migration, dominance, and number of loci on cline widths, speeds of cline movements, two-way gametic correlations ("linkage disequilibria"), and heterozygote deficits. The effects of changing the order of reproduction, migration, and selection, are also briefly explored. Epistasis can also maintain tension zones. We show that epistatic selection is similar in its effects to frequency-dependent selection, except that the disequilibria produced in the zone will be higher for a given level of selection. If selection consists of a mixture of frequency-dependence and epistasis, as is likely in nature, the error made in estimating selection is usually less than twofold. From the graphs, selection and migration can be estimated using knowledge of the dominance and number of genes, of gene frequencies and of gametic correlations from a hybrid zone. 相似文献

19.

Bayesian Inference of Baseline Fertility and Treatment Effects via a Crop Yield-Fertility Model

Hungyen Chen Junko Yamagishi Hirohisa Kishino 《PloS one》2014,9(11)

To effectively manage soil fertility, knowledge is needed of how a crop uses nutrients from fertilizer applied to the soil. Soil quality is a combination of biological, chemical and physical properties and is hard to assess directly because of collective and multiple functional effects. In this paper, we focus on the application of these concepts to agriculture. We define the baseline fertility of soil as the level of fertility that a crop can acquire for growth from the soil. With this strict definition, we propose a new crop yield-fertility model that enables quantification of the process of improving baseline fertility and the effects of treatments solely from the time series of crop yields. The model was modified from Michaelis-Menten kinetics and measured the additional effects of the treatments given the baseline fertility. Using more than 30 years of experimental data, we used the Bayesian framework to estimate the improvements in baseline fertility and the effects of fertilizer and farmyard manure (FYM) on maize (Zea mays), barley (Hordeum vulgare), and soybean (Glycine max) yields. Fertilizer contributed the most to the barley yield and FYM contributed the most to the soybean yield among the three crops. The baseline fertility of the subsurface soil was very low for maize and barley prior to fertilization. In contrast, the baseline fertility in this soil approximated half-saturated fertility for the soybean crop. The long-term soil fertility was increased by adding FYM, but the effect of FYM addition was reduced by the addition of fertilizer. Our results provide evidence that long-term soil fertility under continuous farming was maintained, or increased, by the application of natural nutrients compared with the application of synthetic fertilizer. 相似文献

20.

SnIPRE: Selection Inference Using a Poisson Random Effects Model

Kirsten E. Eilertson James G. Booth Carlos D. Bustamante 《PLoS computational biology》2012,8(12)

We present an approach for identifying genes under natural selection using polymorphism and divergence data from synonymous and non-synonymous sites within genes. A generalized linear mixed model is used to model the genome-wide variability among categories of mutations and estimate its functional consequence. We demonstrate how the model''s estimated fixed and random effects can be used to identify genes under selection. The parameter estimates from our generalized linear model can be transformed to yield population genetic parameter estimates for quantities including the average selection coefficient for new mutations at a locus, the synonymous and non-synynomous mutation rates, and species divergence times. Furthermore, our approach incorporates stochastic variation due to the evolutionary process and can be fit using standard statistical software. The model is fit in both the empirical Bayes and Bayesian settings using the lme4 package in R, and Markov chain Monte Carlo methods in WinBUGS. Using simulated data we compare our method to existing approaches for detecting genes under selection: the McDonald-Kreitman test, and two versions of the Poisson random field based method MKprf. Overall, we find our method universally outperforms existing methods for detecting genes subject to selection using polymorphism and divergence data. 相似文献