共查询到20条相似文献,搜索用时 15 毫秒
1.
A central challenge in computational modeling of biological systems is the determination of the model parameters. Typically, only a fraction of the parameters (such as kinetic rate constants) are experimentally measured, while the rest are often fitted. The fitting process is usually based on experimental time course measurements of observables, which are used to assign parameter values that minimize some measure of the error between these measurements and the corresponding model prediction. The measurements, which can come from immunoblotting assays, fluorescent markers, etc., tend to be very noisy and taken at a limited number of time points. In this work we present a new approach to the problem of parameter selection of biological models. We show how one can use a dynamic recursive estimator, known as extended Kalman filter, to arrive at estimates of the model parameters. The proposed method follows. First, we use a variation of the Kalman filter that is particularly well suited to biological applications to obtain a first guess for the unknown parameters. Secondly, we employ an a posteriori identifiability test to check the reliability of the estimates. Finally, we solve an optimization problem to refine the first guess in case it should not be accurate enough. The final estimates are guaranteed to be statistically consistent with the measurements. Furthermore, we show how the same tools can be used to discriminate among alternate models of the same biological process. We demonstrate these ideas by applying our methods to two examples, namely a model of the heat shock response in E. coli, and a model of a synthetic gene regulation system. The methods presented are quite general and may be applied to a wide class of biological systems where noisy measurements are used for parameter estimation or model selection. 相似文献
2.
3.
Daniel Silk Paul D. W. Kirk Chris P. Barnes Tina Toni Michael P. H. Stumpf 《PLoS computational biology》2014,10(6)
Experimental design attempts to maximise the information available for modelling tasks. An optimal experiment allows the inferred models or parameters to be chosen with the highest expected degree of confidence. If the true system is faithfully reproduced by one of the models, the merit of this approach is clear - we simply wish to identify it and the true parameters with the most certainty. However, in the more realistic situation where all models are incorrect or incomplete, the interpretation of model selection outcomes and the role of experimental design needs to be examined more carefully. Using a novel experimental design and model selection framework for stochastic state-space models, we perform high-throughput in-silico analyses on families of gene regulatory cascade models, to show that the selected model can depend on the experiment performed. We observe that experimental design thus makes confidence a criterion for model choice, but that this does not necessarily correlate with a model''s predictive power or correctness. Finally, in the special case of linear ordinary differential equation (ODE) models, we explore how wrong a model has to be before it influences the conclusions of a model selection analysis. 相似文献
4.
Richard P. Mann Andrea Perna Daniel Str?mbom Roman Garnett James E. Herbert-Read David J. T. Sumpter Ashley J. W. Ward 《PLoS computational biology》2013,9(3)
Inference of interaction rules of animals moving in groups usually relies on an analysis of large scale system behaviour. Models are tuned through repeated simulation until they match the observed behaviour. More recent work has used the fine scale motions of animals to validate and fit the rules of interaction of animals in groups. Here, we use a Bayesian methodology to compare a variety of models to the collective motion of glass prawns (Paratya australiensis). We show that these exhibit a stereotypical ‘phase transition’, whereby an increase in density leads to the onset of collective motion in one direction. We fit models to this data, which range from: a mean-field model where all prawns interact globally; to a spatial Markovian model where prawns are self-propelled particles influenced only by the current positions and directions of their neighbours; up to non-Markovian models where prawns have ‘memory’ of previous interactions, integrating their experiences over time when deciding to change behaviour. We show that the mean-field model fits the large scale behaviour of the system, but does not capture the observed locality of interactions. Traditional self-propelled particle models fail to capture the fine scale dynamics of the system. The most sophisticated model, the non-Markovian model, provides a good match to the data at both the fine scale and in terms of reproducing global dynamics, while maintaining a biologically plausible perceptual range. We conclude that prawns’ movements are influenced by not just the current direction of nearby conspecifics, but also those encountered in the recent past. Given the simplicity of prawns as a study system our research suggests that self-propelled particle models of collective motion should, if they are to be realistic at multiple biological scales, include memory of previous interactions and other non-Markovian effects. 相似文献
5.
Keegan?E. Hines 《Biophysical journal》2015,108(9):2103-2113
Bayesian inference is a powerful statistical paradigm that has gained popularity in many fields of science, but adoption has been somewhat slower in biophysics. Here, I provide an accessible tutorial on the use of Bayesian methods by focusing on example applications that will be familiar to biophysicists. I first discuss the goals of Bayesian inference and show simple examples of posterior inference using conjugate priors. I then describe Markov chain Monte Carlo sampling and, in particular, discuss Gibbs sampling and Metropolis random walk algorithms with reference to detailed examples. These Bayesian methods (with the aid of Markov chain Monte Carlo sampling) provide a generalizable way of rigorously addressing parameter inference and identifiability for arbitrarily complicated models. 相似文献
6.
This article provides a fully Bayesian approach for modeling of single-dose and complete pharmacokinetic data in a population pharmacokinetic (PK) model. To overcome the impact of outliers and the difficulty of computation, a generalized linear model is chosen with the hypothesis that the errors follow a multivariate Student t distribution which is a heavy-tailed distribution. The aim of this study is to investigate and implement the performance of the multivariate t distribution to analyze population pharmacokinetic data. Bayesian predictive inferences and the Metropolis-Hastings algorithm schemes are used to process the intractable posterior integration. The precision and accuracy of the proposed model are illustrated by the simulating data and a real example of theophylline data. 相似文献
7.
Longitudinal allele frequency data are becoming increasingly prevalent. Such samples permit statistical inference of the population genetics parameters that influence the fate of mutant variants. To infer these parameters by maximum likelihood, the mutant frequency is often assumed to evolve according to the Wright–Fisher model. For computational reasons, this discrete model is commonly approximated by a diffusion process that requires the assumption that the forces of natural selection and mutation are weak. This assumption is not always appropriate. For example, mutations that impart drug resistance in pathogens may evolve under strong selective pressure. Here, we present an alternative approximation to the mutant-frequency distribution that does not make any assumptions about the magnitude of selection or mutation and is much more computationally efficient than the standard diffusion approximation. Simulation studies are used to compare the performance of our method to that of the Wright–Fisher and Gaussian diffusion approximations. For large populations, our method is found to provide a much better approximation to the mutant-frequency distribution when selection is strong, while all three methods perform comparably when selection is weak. Importantly, maximum-likelihood estimates of the selection coefficient are severely attenuated when selection is strong under the two diffusion models, but not when our method is used. This is further demonstrated with an application to mutant-frequency data from an experimental study of bacteriophage evolution. We therefore recommend our method for estimating the selection coefficient when the effective population size is too large to utilize the discrete Wright–Fisher model. 相似文献
8.
9.
10.
Until recently, the use of Bayesian inference was limited to a few cases because for many realistic probability models the likelihood function cannot be calculated analytically. The situation changed with the advent of likelihood-free inference algorithms, often subsumed under the term approximate Bayesian computation (ABC). A key innovation was the use of a postsampling regression adjustment, allowing larger tolerance values and as such shifting computation time to realistic orders of magnitude. Here we propose a reformulation of the regression adjustment in terms of a general linear model (GLM). This allows the integration into the sound theoretical framework of Bayesian statistics and the use of its methods, including model selection via Bayes factors. We then apply the proposed methodology to the question of population subdivision among western chimpanzees, Pan troglodytes verus.WITH the advent of ever more powerful computers and the refinement of algorithms like MCMC or Gibbs sampling, Bayesian statistics have become an important tool for scientific inference during the past two decades. Consider a model M creating data D (DNA sequence data, for example) determined by parameters from some (bounded) parameter space Π ⊂ Rm whose joint prior density we denote by . The quantity of interest is the posterior distribution of the parameters, which can be calculated by Bayes rule aswhere is the likelihood of the data and is a normalizing constant. Direct use of this formula, however, is often prevented by the fact that the likelihood function cannot be calculated analytically for many realistic probability models. In these cases one is obliged to use stochastic simulation. Tavaré et al. (1997) propose a rejection sampling method for simulating a posterior random sample where the full data D are replaced by a summary statistic s (like the number of segregating sites in their setting). Even if the statistic does not capture the full information contained in the data D, rejection sampling allows for the simulation of approximate posterior distributions of the parameters in question (the scaled mutation rate in their model). This approach was extended to multiple-parameter models with multivariate summary statistics by Weiss and von Haeseler (1998). In their setting a candidate vector of parameters is simulated from a prior distribution and is accepted if its corresponding vector of summary statistics is sufficiently close to the observed summary statistics sobs with respect to some metric in the space of s, i.e., if dist(s, sobs) < ε for a fixed tolerance ε. We suppose that the likelihood of the full model is continuous and nonzero around sobs. In practice the summary statistics are often discrete but the range of values is large enough to be approximated by real numbers. The likelihood of the truncated model obtained by this acceptance–rejection process is given by(1)where is the ε-ball in the space of summary statistics and Ind(·) is the indicator function. Observe that degenerates to a (Dirac) point measure centered at sobs as . If the parameters are generated from a prior , then the distribution of the parameters retained after the rejection process outlined above is given by(2)We call this density the truncated prior. Combining (1) and (2) we get(3)Thus the posterior distribution of the parameters under the model M for s = sobs given the prior is exactly equal to the posterior distribution under the truncated model given the truncated prior . If we can estimate the truncated prior and make an educated guess for a parametric statistical model of Mε(sobs), we arrive at a reasonable approximation of the posterior even if the likelihood of the full model M is unknown. It is to be expected that due to the localization process the truncated model will exhibit a simpler structure than the full model M and thus be easier to estimate.Estimating is straightforward, at least when the summary statistics can be sampled from M in a reasonable amount of time: Sample the parameters from the prior , create their respective statistics s from M, and save those parameters whose statistics lie in in a list . The empirical distribution of these retained parameters yields an estimate of . If the tolerance ε is small, then one can assume that is close to some (unknown) constant over the whole range of . Under that assumption, Equation 3 shows that . However, when the dimension n of summary statistics is high (and for more complex models dimensions like n = 50 are not unusual), the “curse of dimensionality” implies that the tolerance must be chosen rather large or else the acceptance rate becomes prohibitively low. This, however, distorts the precision of the approximation of the posterior distribution by the truncated prior (see Wegmann et al. 2009). This situation can be partially alleviated by speeding up the sampling process; such methods are subsumed under the term approximate Bayesian computation (ABC). Marjoram et al. (2003) develop a variant of the classical Metropolis–Hastings algorithm (termed ABC–MCMC in Sisson et al. 2007), which allows them to sample directly from the truncated prior . In Sisson et al. (2007) a sequential Monte Carlo sampler is proposed, requiring substantially less iterations than ABC–MCMC. But even when such methods are applied, the assumption that is constant over the ε-ball is a very rough one, indeed.To take into account the variation of within the ε-ball, a postsampling regression adjustment (termed ABC-REG in the following) of the sample P of retained parameters is introduced in the important article by Beaumont et al. (2002). Basically, they postulate a (locally) linear dependence between the parameters and their associated summary statistics s. More precisely, the (local) model they implicitly assume is of the form , where M is a matrix of regression coefficients, m0 a constant vector, and a random vector of zero mean. Computer simulations suggest that for many population models ABC–REG yields posterior marginal densities that have narrower highest posterior density (HPD) regions and are more closely centered around the true parameter values than the empirical posterior densities directly produced by ABC samplers (Wegmann et al. 2009). An attractive feature of ABC–REG is that the posterior adjustment is performed directly on the simulated parameters, which makes estimation of the marginal posteriors of individual parameters particularly easy. The method can also be extended to more complex, nonlinear models as demonstrated, e.g., in Blum and Francois (2009). In extreme situations, however, ABC–REG may yield posteriors that are nonzero in parameter regions where the priors actually vanish (see Figure 1B for an illustration of this phenomenon). Moreover, it is not clear how ABC–REG could yield an estimate of the marginal density of model M at sobs, information that is useful for model comparison.Open in a separate windowFigure 1.—Comparison of rejection (A and D), ABC–REG (B and E), and ABC–GLM (C and F) posteriors with those obtained from analytical likelihood calculations. We estimated the population–mutation parameter θ = 4Nμ of a panmictic population for different observed numbers of segregating sites (see text). Shades indicate the L1 distance between the inferred and the analytically calculated posterior. White corresponds to an exact match (zero distance) and darker gray shades indicate larger distances. If the inferred posterior differs from the analytical more than the prior does, squares are marked in black. The top row (A–C) corresponds to cases with a uniform prior θ ∼ Unif([0.005, 10]) and the bottom row (D–F) to cases with a discontinuous prior with “gap.” The tolerance ε is given as the absolute distance in number of segregating sites. Shown are averages over 25 independent estimations. To have a fair comparison, we adjusted the smoothing parameters (bandwidths) to get the best results for all approaches.In contrast to ABC–REG we treat the parameters as exogenous and the summary statistics s as endogenous variables and we stipulate for a general linear model (GLM in the literature—not to be confused with the generalized linear models that unfortunately share the same abbreviation). To be precise, we assume the summary statistics s created by the truncated model''s likelihood to satisfy(4)where C is a n × m matrix of constants, c0 an n × 1 vector, and a random vector with a multivariate normal distribution of zero mean and covariance matrix :A GLM has the advantage of taking into account not only the (local) linearity, but also the strong correlation normally present between the components of the summary statistics. Of course, the model assumption (4) can never represent the full truth since its statistics are in principle unbounded whereas the likelihood is supported on the ε-ball around sobs. But since the multivariate Gaussians will fall off rapidly in practice and not reach far out off the boundary of , this is a disadvantage we can live with. In particular, the ordinary least squares (OLS) estimate outlined below implies that for the constant c0 tends to sobs whereas the design matrix C and the covariance matrix both vanish. This means that in the limit of zero tolerance ε = 0 our model assumption yields the true posterior distribution of M. 相似文献
11.
12.
One of the key obstacles to better understanding, anticipating, and managing biological invasions is the difficulty researchers face when trying to quantify the many important aspects of the communities that affect and are affected by non-indigenous species (NIS). Bayesian Learning Networks (BLNs) combine graphical models with multivariate Bayesian statistics to provide an analytical tool for the quantification of communities. BLNs can determine which components of a natural system influence which others, quantify this influence, and provide inferential analysis of parameter changes when changes in network variables are hypothesized or observed. After a brief explanation of these three functions of BLNs, a simulated network is analyzed for structure, parameter estimation, and inference. Discussion of this approach to invasions biology is explored and expanded applications for BLNs are then offered. 相似文献
13.
Xiaoying Tang Kenichi Oishi Andreia V. Faria Argye E. Hillis Marilyn S. Albert Susumu Mori Michael I. Miller 《PloS one》2013,8(6)
This paper examines the multiple atlas random diffeomorphic orbit model in Computational Anatomy (CA) for parameter estimation and segmentation of subcortical and ventricular neuroanatomy in magnetic resonance imagery. We assume that there exist multiple magnetic resonance image (MRI) atlases, each atlas containing a collection of locally-defined charts in the brain generated via manual delineation of the structures of interest. We focus on maximum a posteriori estimation of high dimensional segmentations of MR within the class of generative models representing the observed MRI as a conditionally Gaussian random field, conditioned on the atlas charts and the diffeomorphic change of coordinates of each chart that generates it. The charts and their diffeomorphic correspondences are unknown and viewed as latent or hidden variables. We demonstrate that the expectation-maximization (EM) algorithm arises naturally, yielding the likelihood-fusion equation which the a posteriori estimator of the segmentation labels maximizes. The likelihoods being fused are modeled as conditionally Gaussian random fields with mean fields a function of each atlas chart under its diffeomorphic change of coordinates onto the target. The conditional-mean in the EM algorithm specifies the convex weights with which the chart-specific likelihoods are fused. The multiple atlases with the associated convex weights imply that the posterior distribution is a multi-modal representation of the measured MRI. Segmentation results for subcortical and ventricular structures of subjects, within populations of demented subjects, are demonstrated, including the use of multiple atlases across multiple diseased groups. 相似文献
14.
15.
Xavier Rubio-Campillo 《PloS one》2016,11(1)
Formal Models and History
Computational models are increasingly being used to study historical dynamics. This new trend, which could be named Model-Based History, makes use of recently published datasets and innovative quantitative methods to improve our understanding of past societies based on their written sources. The extensive use of formal models allows historians to re-evaluate hypotheses formulated decades ago and still subject to debate due to the lack of an adequate quantitative framework. The initiative has the potential to transform the discipline if it solves the challenges posed by the study of historical dynamics. These difficulties are based on the complexities of modelling social interaction, and the methodological issues raised by the evaluation of formal models against data with low sample size, high variance and strong fragmentation.Case Study
This work examines an alternate approach to this evaluation based on a Bayesian-inspired model selection method. The validity of the classical Lanchester’s laws of combat is examined against a dataset comprising over a thousand battles spanning 300 years. Four variations of the basic equations are discussed, including the three most common formulations (linear, squared, and logarithmic) and a new variant introducing fatigue. Approximate Bayesian Computation is then used to infer both parameter values and model selection via Bayes Factors.Impact
Results indicate decisive evidence favouring the new fatigue model. The interpretation of both parameter estimations and model selection provides new insights into the factors guiding the evolution of warfare. At a methodological level, the case study shows how model selection methods can be used to guide historical research through the comparison between existing hypotheses and empirical evidence. 相似文献16.
apd: 9 March 2001 相似文献
17.
Jens Nielsen 《Biotechnology journal》2019,14(9)
For thousands of years, the yeast Saccharomyces cerevisiae (S. cerevisiae) has served as a cell factory for the production of bread, beer, and wine. In more recent years, this yeast has also served as a cell factory for producing many different fuels, chemicals, food ingredients, and pharmaceuticals. S. cerevisiae, however, has also served as a very important model organism for studying eukaryal biology, and even today many new discoveries, important for the treatment of human diseases, are made using this yeast as a model organism. Here a brief review of the use of S. cerevisiae as a model organism for studying eukaryal biology, its use as a cell factory, and how advances in systems biology underpin developments in both these areas, is provided. 相似文献
18.
Frequency-dependent selection against rare forms can maintain clines. For weak selection, s, in simple linear models of frequency-dependence, single locus clines are stabilized with a maximum slope of between square root of s/square root of 8 sigma and square root of s/square root of 12 delta, where sigma is the dispersal distance. These clines are similar to those maintained by heterozygote disadvantage. Using computer simulations, the weak-selection analytical results are extended to higher selection pressures with up to three unlinked genes. Graphs are used to display the effect of selection, migration, dominance, and number of loci on cline widths, speeds of cline movements, two-way gametic correlations ("linkage disequilibria"), and heterozygote deficits. The effects of changing the order of reproduction, migration, and selection, are also briefly explored. Epistasis can also maintain tension zones. We show that epistatic selection is similar in its effects to frequency-dependent selection, except that the disequilibria produced in the zone will be higher for a given level of selection. If selection consists of a mixture of frequency-dependence and epistasis, as is likely in nature, the error made in estimating selection is usually less than twofold. From the graphs, selection and migration can be estimated using knowledge of the dominance and number of genes, of gene frequencies and of gametic correlations from a hybrid zone. 相似文献
19.
To effectively manage soil fertility, knowledge is needed of how a crop uses nutrients from fertilizer applied to the soil. Soil quality is a combination of biological, chemical and physical properties and is hard to assess directly because of collective and multiple functional effects. In this paper, we focus on the application of these concepts to agriculture. We define the baseline fertility of soil as the level of fertility that a crop can acquire for growth from the soil. With this strict definition, we propose a new crop yield-fertility model that enables quantification of the process of improving baseline fertility and the effects of treatments solely from the time series of crop yields. The model was modified from Michaelis-Menten kinetics and measured the additional effects of the treatments given the baseline fertility. Using more than 30 years of experimental data, we used the Bayesian framework to estimate the improvements in baseline fertility and the effects of fertilizer and farmyard manure (FYM) on maize (Zea mays), barley (Hordeum vulgare), and soybean (Glycine max) yields. Fertilizer contributed the most to the barley yield and FYM contributed the most to the soybean yield among the three crops. The baseline fertility of the subsurface soil was very low for maize and barley prior to fertilization. In contrast, the baseline fertility in this soil approximated half-saturated fertility for the soybean crop. The long-term soil fertility was increased by adding FYM, but the effect of FYM addition was reduced by the addition of fertilizer. Our results provide evidence that long-term soil fertility under continuous farming was maintained, or increased, by the application of natural nutrients compared with the application of synthetic fertilizer. 相似文献
20.
We present an approach for identifying genes under natural selection using polymorphism and divergence data from synonymous and non-synonymous sites within genes. A generalized linear mixed model is used to model the genome-wide variability among categories of mutations and estimate its functional consequence. We demonstrate how the model''s estimated fixed and random effects can be used to identify genes under selection. The parameter estimates from our generalized linear model can be transformed to yield population genetic parameter estimates for quantities including the average selection coefficient for new mutations at a locus, the synonymous and non-synynomous mutation rates, and species divergence times. Furthermore, our approach incorporates stochastic variation due to the evolutionary process and can be fit using standard statistical software. The model is fit in both the empirical Bayes and Bayesian settings using the lme4 package in R, and Markov chain Monte Carlo methods in WinBUGS. Using simulated data we compare our method to existing approaches for detecting genes under selection: the McDonald-Kreitman test, and two versions of the Poisson random field based method MKprf. Overall, we find our method universally outperforms existing methods for detecting genes subject to selection using polymorphism and divergence data. 相似文献