首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
ABSTRACT: BACKGROUND: Predicting a system's behavior based on a mathematical model is a primary task in Systems Biology. If the model parameters are estimated from experimental data, the parameter uncertainty has to be translated into confidence intervals for model predictions. For dynamic models of biochemical networks, the nonlinearity in combination with the large number of parameters hampers the calculation of prediction confidence intervals and renders classical approaches as hardly feasible. RESULTS: In this article reliable confidence intervals are calculated based on the prediction profile likelihood. Such prediction confidence intervals of the dynamic states can be utilized for a data-based observability analysis. The method is also applicable if there are non-identifiable parameters yielding to some insufficiently specified modelpredictions that can be interpreted as non-observability. Moreover, a validation profile likelihood is introduced that should be applied when noisy validation experiments are to be interpreted. CONCLUSIONS: The presented methodology allows the propagation of uncertainty from experimental to model pre-dictions. Although presented in the context of ordinary differential equations, the concept is general and also applicable to other types of models. Matlab code which can be used as a template to implement the method is provided at http://www.fdmold.uni-freiburg.de/~ckreutz/PPL .  相似文献   

2.
Gene regulatory, signal transduction and metabolic networks are major areas of interest in the newly emerging field of systems biology. In living cells, stochastic dynamics play an important role; however, the kinetic parameters of biochemical reactions necessary for modelling these processes are often not accessible directly through experiments. The problem of estimating stochastic reaction constants from molecule count data measured, with error, at discrete time points is considered. For modelling the system, a hidden Markov process is used, where the hidden states are the true molecule counts, and the transitions between those states correspond to reaction events following collisions of molecules. Two different algorithms are proposed for estimating the unknown model parameters. The first is an approximate maximum likelihood method that gives good estimates of the reaction parameters in systems with few possible reactions in each sampling interval. The second algorithm, treating the data as exact measurements, approximates the number of reactions in each sampling interval by solving a simple linear equation. Maximising the likelihood based on these approximations can provide good results, even in complex reaction systems.  相似文献   

3.
In quantitative biology, observed data are fitted to a model that captures the essence of the system under investigation in order to obtain estimates of the parameters of the model, as well as their standard errors and interactions. The fitting is best done by the method of maximum likelihood, though least-squares fits are often used as an approximation because the calculations are perceived to be simpler. Here Brian Williams and Chris Dye argue that the method of maximum likelihood is generally preferable to least squares giving the best estimates of the parameters for data with any given error distribution, and the calculations are no more difficult than for least-squares fitting. They offer a relatively simple explanation of the methods and describe its implementation using examples from leishmaniasis epidemiology.  相似文献   

4.
Neural networks are considered by many to be very promising tools for classification and prediction. The flexibility of the neural network models often result in over-fit. Shrinking the parameters using a penalized likelihood is often used in order to overcome such over-fit. In this paper we extend the approach proposed by FARAGGI and SIMON (1995a) to modeling censored survival data using the input-output relationship associated with a single hidden layer feed-forward neural network. Instead of estimating the neural network parameters using the method of maximum likelihood, we place normal prior distributions on the parameters and make inferences based on derived posterior distributions of the parameters. This Bayesian formulation will result in shrinking the parameters of the neural network model and will reduce the over-fit compared with the maximum likelihood estimators. We illustrate our proposed method on a simulated and a real example.  相似文献   

5.
Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor.  相似文献   

6.
Previous mathematical modeling efforts have made significant contributions to the development of systems biology for predicting biological behavior quantitatively. However, dynamic metabolic model construction remains challenging due to uncertainties in mechanistic structures and parameters. In addition, parameter estimation and model validation often require designated experiments conducted only for purpose of modeling. Such difficulties have hampered the progress of modeling in biology and biotechnology. To circumvent these problems, ensemble approaches have been used to account for uncertainties in model structure and parameters. Specifically, this review focuses on approaches that utilize readily available fermentation data for parameter screening and model validation. Time course data for metabolite measurements, if available, can further calibrate the model. The basis for this approach is explained in non-mathematical terms accessible to experimentalists. Information gained from such an approach has been shown to be useful in designing Escherichia coli strains for metabolic engineering and synthetic biology.  相似文献   

7.
Mathematical models have long been used for prediction of dynamics in biological systems. Recently, several efforts have been made to render these models patient specific. One way to do so is to employ techniques to estimate parameters that enable model based prediction of observed quantities. Knowledge of variation in parameters within and between groups of subjects have potential to provide insight into biological function. Often it is not possible to estimate all parameters in a given model, in particular if the model is complex and the data is sparse. However, it may be possible to estimate a subset of model parameters reducing the complexity of the problem. In this study, we compare three methods that allow identification of parameter subsets that can be estimated given a model and a set of data. These methods will be used to estimate patient specific parameters in a model predicting baroreceptor feedback regulation of heart rate during head-up tilt. The three methods include: structured analysis of the correlation matrix, analysis via singular value decomposition followed by QR factorization, and identification of the subspace closest to the one spanned by eigenvectors of the model Hessian. Results showed that all three methods facilitate identification of a parameter subset. The “best” subset was obtained using the structured correlation method, though this method was also the most computationally intensive. Subsets obtained using the other two methods were easier to compute, but analysis revealed that the final subsets contained correlated parameters. In conclusion, to avoid lengthy computations, these three methods may be combined for efficient identification of parameter subsets.  相似文献   

8.
Simulation software is often a fundamental component in systems biology projects and provides a key aspect of the integration of experimental and analytical techniques in the search for greater understanding and prediction of biology at the systems level. It is important that the modelling and analysis software is reliable and that techniques exist for automating the analysis of the vast amounts of data which such simulation environments generate. A rigorous approach to the development of complex modelling software is needed. Such a framework is presented here together with techniques for the automated analysis of such models and a process for the automatic discovery of biological phenomena from large simulation data sets. Illustrations are taken from a major systems biology research project involving the in vitro investigation, modelling and simulation of epithelial tissue.  相似文献   

9.
Genetic information, such as single nucleotide polymorphism (SNP) data, has been widely recognized as useful in prediction of disease risk. However, how to model the genetic data that is often categorical in disease class prediction is complex and challenging. In this paper, we propose a novel class of nonlinear threshold index logistic models to deal with the complex, nonlinear effects of categorical/discrete SNP covariates for Schizophrenia class prediction. A maximum likelihood methodology is suggested to estimate the unknown parameters in the models. Simulation studies demonstrate that the proposed methodology works viably well for moderate-size samples. The suggested approach is therefore applied to the analysis of the Schizophrenia classification by using a real set of SNP data from Western Australian Family Study of Schizophrenia (WAFSS). Our empirical findings provide evidence that the proposed nonlinear models well outperform the widely used linear and tree based logistic regression models in class prediction of schizophrenia risk with SNP data in terms of both Types I/II error rates and ROC curves.  相似文献   

10.
B Steiert  A Raue  J Timmer  C Kreutz 《PloS one》2012,7(7):e40052
Systems biology aims for building quantitative models to address unresolved issues in molecular biology. In order to describe the behavior of biological cells adequately, gene regulatory networks (GRNs) are intensively investigated. As the validity of models built for GRNs depends crucially on the kinetic rates, various methods have been developed to estimate these parameters from experimental data. For this purpose, it is favorable to choose the experimental conditions yielding maximal information. However, existing experimental design principles often rely on unfulfilled mathematical assumptions or become computationally demanding with growing model complexity. To solve this problem, we combined advanced methods for parameter and uncertainty estimation with experimental design considerations. As a showcase, we optimized three simulated GRNs in one of the challenges from the Dialogue for Reverse Engineering Assessment and Methods (DREAM). This article presents our approach, which was awarded the best performing procedure at the DREAM6 Estimation of Model Parameters challenge. For fast and reliable parameter estimation, local deterministic optimization of the likelihood was applied. We analyzed identifiability and precision of the estimates by calculating the profile likelihood. Furthermore, the profiles provided a way to uncover a selection of most informative experiments, from which the optimal one was chosen using additional criteria at every step of the design process. In conclusion, we provide a strategy for optimal experimental design and show its successful application on three highly nonlinear dynamic models. Although presented in the context of the GRNs to be inferred for the DREAM6 challenge, the approach is generic and applicable to most types of quantitative models in systems biology and other disciplines.  相似文献   

11.
12.
In statistical modelling, the effects of single-nucleotide polymorphisms (SNPs) are often regarded as time-independent. However, for traits recorded repeatedly, it is very interesting to investigate the behaviour of gene effects over time. In the analysis, simulated data from the 13th QTL-MAS Workshop (Wageningen, The Netherlands, April 2009) was used and the major goal was the modelling of genetic effects as time-dependent. For this purpose, a mixed model which describes each effect using the third-order Legendre orthogonal polynomials, in order to account for the correlation between consecutive measurements, is fitted. In this model, SNPs are modelled as fixed, while the environment is modelled as random effects. The maximum likelihood estimates of model parameters are obtained by the expectation–maximisation (EM) algorithm and the significance of the additive SNP effects is based on the likelihood ratio test, with p-values corrected for multiple testing. For each significant SNP, the percentage of the total variance contributed by this SNP is calculated. Moreover, by using a model which simultaneously incorporates effects of all of the SNPs, the prediction of future yields is conducted. As a result, 179 from the total of 453 SNPs covering 16 out of 18 true quantitative trait loci (QTL) were selected. The correlation between predicted and true breeding values was 0.73 for the data set with all SNPs and 0.84 for the data set with selected SNPs. In conclusion, we showed that a longitudinal approach allows for estimating changes of the variance contributed by each SNP over time and demonstrated that, for prediction, the pre-selection of SNPs plays an important role.  相似文献   

13.
Background: In systems biology, the dynamics of biological networks are often modeled with ordinary differential equations (ODEs) that encode interacting components in the systems, resulting in highly complex models. In contrast, the amount of experimentally available data is almost always limited, and insufficient to constrain the parameters. In this situation, parameter estimation is a very challenging problem. To address this challenge, two intuitive approaches are to perform experimental design to generate more data, and to perform model reduction to simplify the model. Experimental design and model reduction have been traditionally viewed as two distinct areas, and an extensive literature and excellent reviews exist on each of the two areas. Intriguingly, however, the intrinsic connections between the two areas have not been recognized.Results: Experimental design and model reduction are deeply related, and can be considered as one unified framework. There are two recent methods that can tackle both areas, one based on model manifold and the other based on profile likelihood. We use a simple sum-of-two-exponentials example to discuss the concepts and algorithmic details of both methods, and provide Matlab-based code and implementation which are useful resources for the dissemination and adoption of experimental design and model reduction in the biology community.Conclusions: From a geometric perspective, we consider the experimental data as a point in a high-dimensional data space and the mathematical model as a manifold living in this space. Parameter estimation can be viewed as a projection of the data point onto the manifold. By examining the singularity around the projected point on the manifold, we can perform both experimental design and model reduction. Experimental design identifies new experiments that expand the manifold and remove the singularity, whereas model reduction identifies the nearest boundary, which is the nearest singularity that suggests an appropriate form of a reduced model. This geometric interpretation represents one step toward the convergence of experimental design and model reduction as a unified framework.  相似文献   

14.
Quantitative computational models play an increasingly important role in modern biology. Such models typically involve many free parameters, and assigning their values is often a substantial obstacle to model development. Directly measuring in vivo biochemical parameters is difficult, and collectively fitting them to other experimental data often yields large parameter uncertainties. Nevertheless, in earlier work we showed in a growth-factor-signaling model that collective fitting could yield well-constrained predictions, even when it left individual parameters very poorly constrained. We also showed that the model had a “sloppy” spectrum of parameter sensitivities, with eigenvalues roughly evenly distributed over many decades. Here we use a collection of models from the literature to test whether such sloppy spectra are common in systems biology. Strikingly, we find that every model we examine has a sloppy spectrum of sensitivities. We also test several consequences of this sloppiness for building predictive models. In particular, sloppiness suggests that collective fits to even large amounts of ideal time-series data will often leave many parameters poorly constrained. Tests over our model collection are consistent with this suggestion. This difficulty with collective fits may seem to argue for direct parameter measurements, but sloppiness also implies that such measurements must be formidably precise and complete to usefully constrain many model predictions. We confirm this implication in our growth-factor-signaling model. Our results suggest that sloppy sensitivity spectra are universal in systems biology models. The prevalence of sloppiness highlights the power of collective fits and suggests that modelers should focus on predictions rather than on parameters.  相似文献   

15.
Inference of the insulin secretion rate (ISR) from C-peptide measurements as a quantification of pancreatic β-cell function is clinically important in diseases related to reduced insulin sensitivity and insulin action. ISR derived from C-peptide concentration is an example of nonparametric Bayesian model selection where a proposed ISR time-course is considered to be a "model". An inferred value of inaccessible continuous variables from discrete observable data is often problematic in biology and medicine, because it is a priori unclear how robust the inference is to the deletion of data points, and a closely related question, how much smoothness or continuity the data actually support. Predictions weighted by the posterior distribution can be cast as functional integrals as used in statistical field theory. Functional integrals are generally difficult to evaluate, especially for nonanalytic constraints such as positivity of the estimated parameters. We propose a computationally tractable method that uses the exact solution of an associated likelihood function as a prior probability distribution for a Markov-chain Monte Carlo evaluation of the posterior for the full model. As a concrete application of our method, we calculate the ISR from actual clinical C-peptide measurements in human subjects with varying degrees of insulin sensitivity. Our method demonstrates the feasibility of functional integral Bayesian model selection as a practical method for such data-driven inference, allowing the data to determine the smoothing timescale and the width of the prior probability distribution on the space of models. In particular, our model comparison method determines the discrete time-step for interpolation of the unobservable continuous variable that is supported by the data. Attempts to go to finer discrete time-steps lead to less likely models.  相似文献   

16.
17.
The fate of scientific hypotheses often relies on the ability of a computational model to explain the data, quantified in modern statistical approaches by the likelihood function. The log-likelihood is the key element for parameter estimation and model evaluation. However, the log-likelihood of complex models in fields such as computational biology and neuroscience is often intractable to compute analytically or numerically. In those cases, researchers can often only estimate the log-likelihood by comparing observed data with synthetic observations generated by model simulations. Standard techniques to approximate the likelihood via simulation either use summary statistics of the data or are at risk of producing substantial biases in the estimate. Here, we explore another method, inverse binomial sampling (IBS), which can estimate the log-likelihood of an entire data set efficiently and without bias. For each observation, IBS draws samples from the simulator model until one matches the observation. The log-likelihood estimate is then a function of the number of samples drawn. The variance of this estimator is uniformly bounded, achieves the minimum variance for an unbiased estimator, and we can compute calibrated estimates of the variance. We provide theoretical arguments in favor of IBS and an empirical assessment of the method for maximum-likelihood estimation with simulation-based models. As case studies, we take three model-fitting problems of increasing complexity from computational and cognitive neuroscience. In all problems, IBS generally produces lower error in the estimated parameters and maximum log-likelihood values than alternative sampling methods with the same average number of samples. Our results demonstrate the potential of IBS as a practical, robust, and easy to implement method for log-likelihood evaluation when exact techniques are not available.  相似文献   

18.
19.
The available information on sample size requirements of mixture analysis methods is insufficient to permit a precise evaluation of the potential problems facing practical applications of mixture analysis. We use results from Monte Carlo simulation to assess the sample size requirements of a simple mixture analysis method under conditions relevant to biological applications of mixture analysis. The mixture model used includes two univariate normal components with equal variances but assumes that the researcher is ignorant as to the equality of the variances. The method used relies on the EM algorithm to compute the maximum likelihood estimates of the mixture parameters, and the likelihood ratio test to assess the number of components in the mixtures. Our results suggest that sample sizes close to 500 or 1000 data may be required to adequately solve mixtures commonly found in biology. Sample sizes of 500 or 1000 are difficult to achieve. However, use of this MA method may be a reasonable option when the researcher deals with problems which are intractable by other means. Copyright 1999 Academic Press.  相似文献   

20.

Background  

Modeling of biological pathways is a key issue in systems biology. When constructing a model, it is tempting to incorporate all known interactions of pathway species, which results in models with a large number of unknown parameters. Fortunately, unknown parameters need not necessarily be measured directly, but some parameter values can be estimated indirectly by fitting the model to experimental data. However, parameter fitting, or, more precisely, maximum likelihood parameter estimation, only provides valid results, if the complexity of the model is in balance with the amount and quality of the experimental data. If this is the case the model is said to be identifiable for the given data. If a model turns out to be unidentifiable, two steps can be taken. Either additional experiments need to be conducted, or the model has to be simplified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号