首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Perfect knowledge of the underlying state transition probabilities is necessary for designing an optimal intervention strategy for a given Markovian genetic regulatory network. However, in many practical situations, the complex nature of the network and/or identification costs limit the availability of such perfect knowledge. To address this difficulty, we propose to take a Bayesian approach and represent the system of interest as an uncertainty class of several models, each assigned some probability, which reflects our prior knowledge about the system. We define the objective function to be the expected cost relative to the probability distribution over the uncertainty class and formulate an optimal Bayesian robust intervention policy minimizing this cost function. The resulting policy may not be optimal for a fixed element within the uncertainty class, but it is optimal when averaged across the uncertainly class. Furthermore, starting from a prior probability distribution over the uncertainty class and collecting samples from the process over time, one can update the prior distribution to a posterior and find the corresponding optimal Bayesian robust policy relative to the posterior distribution. Therefore, the optimal intervention policy is essentially nonstationary and adaptive.  相似文献   

2.
MOTIVATION: Given the joint feature-label distribution, increasing the number of features always results in decreased classification error; however, this is not the case when a classifier is designed via a classification rule from sample data. Typically (but not always), for fixed sample size, the error of a designed classifier decreases and then increases as the number of features grows. The potential downside of using too many features is most critical for small samples, which are commonplace for gene-expression-based classifiers for phenotype discrimination. For fixed sample size and feature-label distribution, the issue is to find an optimal number of features. RESULTS: Since only in rare cases is there a known distribution of the error as a function of the number of features and sample size, this study employs simulation for various feature-label distributions and classification rules, and across a wide range of sample and feature-set sizes. To achieve the desired end, finding the optimal number of features as a function of sample size, it employs massively parallel computation. Seven classifiers are treated: 3-nearest-neighbor, Gaussian kernel, linear support vector machine, polynomial support vector machine, perceptron, regular histogram and linear discriminant analysis. Three Gaussian-based models are considered: linear, nonlinear and bimodal. In addition, real patient data from a large breast-cancer study is considered. To mitigate the combinatorial search for finding optimal feature sets, and to model the situation in which subsets of genes are co-regulated and correlation is internal to these subsets, we assume that the covariance matrix of the features is blocked, with each block corresponding to a group of correlated features. Altogether there are a large number of error surfaces for the many cases. These are provided in full on a companion website, which is meant to serve as resource for those working with small-sample classification. AVAILABILITY: For the companion website, please visit http://public.tgen.org/tamu/ofs/ CONTACT: e-dougherty@ee.tamu.edu.  相似文献   

3.
Group testing, also known as pooled testing, and inverse sampling are both widely used methods of data collection when the goal is to estimate a small proportion. Taking a Bayesian approach, we consider the new problem of estimating disease prevalence from group testing when inverse (negative binomial) sampling is used. Using different distributions to incorporate prior knowledge of disease incidence and different loss functions, we derive closed form expressions for posterior distributions and resulting point and credible interval estimators. We then evaluate our new estimators, on Bayesian and classical grounds, and apply our methods to a West Nile Virus data set.  相似文献   

4.
Bochkina N  Richardson S 《Biometrics》2007,63(4):1117-1125
We consider the problem of identifying differentially expressed genes in microarray data in a Bayesian framework with a noninformative prior distribution on the parameter quantifying differential expression. We introduce a new rule, tail posterior probability, based on the posterior distribution of the standardized difference, to identify genes differentially expressed between two conditions, and we derive a frequentist estimator of the false discovery rate associated with this rule. We compare it to other Bayesian rules in the considered settings. We show how the tail posterior probability can be extended to testing a compound null hypothesis against a class of specific alternatives in multiclass data.  相似文献   

5.
Roy J  Daniels MJ 《Biometrics》2008,64(2):538-545
Summary .   In this article we consider the problem of fitting pattern mixture models to longitudinal data when there are many unique dropout times. We propose a marginally specified latent class pattern mixture model. The marginal mean is assumed to follow a generalized linear model, whereas the mean conditional on the latent class and random effects is specified separately. Because the dimension of the parameter vector of interest (the marginal regression coefficients) does not depend on the assumed number of latent classes, we propose to treat the number of latent classes as a random variable. We specify a prior distribution for the number of classes, and calculate (approximate) posterior model probabilities. In order to avoid the complications with implementing a fully Bayesian model, we propose a simple approximation to these posterior probabilities. The ideas are illustrated using data from a longitudinal study of depression in HIV-infected women.  相似文献   

6.

Background

A fundamental problem for translational genomics is to find optimal therapies based on gene regulatory intervention. Dynamic intervention involves a control policy that optimally reduces a cost function based on phenotype by externally altering the state of the network over time. When a gene regulatory network (GRN) model is fully known, the problem is addressed using classical dynamic programming based on the Markov chain associated with the network. When the network is uncertain, a Bayesian framework can be applied, where policy optimality is with respect to both the dynamical objective and the uncertainty, as characterized by a prior distribution. In the presence of uncertainty, it is of great practical interest to develop an experimental design strategy and thereby select experiments that optimally reduce a measure of uncertainty.

Results

In this paper, we employ mean objective cost of uncertainty (MOCU), which quantifies uncertainty based on the degree to which uncertainty degrades the operational objective, that being the cost owing to undesirable phenotypes. We assume that a number of conditional probabilities characterizing regulatory relationships among genes are unknown in the Markovian GRN. In sum, there is a prior distribution which can be updated to a posterior distribution by observing a regulatory trajectory, and an optimal control policy, known as an “intrinsically Bayesian robust” (IBR) policy. To obtain a better IBR policy, we select an experiment that minimizes the MOCU remaining after applying its output to the network. At this point, we can either stop and find the resulting IBR policy or proceed to determine more unknown conditional probabilities via regulatory observation and find the IBR policy from the resulting posterior distribution. For sequential experimental design this entire process is iterated. Owing to the computational complexity of experimental design, which requires computation of many potential IBR policies, we implement an approximate method utilizing mean first passage times (MFPTs) – but only in experimental design, the final policy being an IBR policy.

Conclusions

Comprehensive performance analysis based on extensive simulations on synthetic and real GRNs demonstrate the efficacy of the proposed method, including the accuracy and computational advantage of the approximate MFPT-based design.
  相似文献   

7.
Inferring speciation times under an episodic molecular clock   总被引:5,自引:0,他引:5  
We extend our recently developed Markov chain Monte Carlo algorithm for Bayesian estimation of species divergence times to allow variable evolutionary rates among lineages. The method can use heterogeneous data from multiple gene loci and accommodate multiple fossil calibrations. Uncertainties in fossil calibrations are described using flexible statistical distributions. The prior for divergence times for nodes lacking fossil calibrations is specified by use of a birth-death process with species sampling. The prior for lineage-specific substitution rates is specified using either a model with autocorrelated rates among adjacent lineages (based on a geometric Brownian motion model of rate drift) or a model with independent rates among lineages specified by a log-normal probability distribution. We develop an infinite-sites theory, which predicts that when the amount of sequence data approaches infinity, the width of the posterior credibility interval and the posterior mean of divergence times form a perfect linear relationship, with the slope indicating uncertainties in time estimates that cannot be reduced by sequence data alone. Simulations are used to study the influence of among-lineage rate variation and the number of loci sampled on the uncertainty of divergence time estimates. The analysis suggests that posterior time estimates typically involve considerable uncertainties even with an infinite amount of sequence data, and that the reliability and precision of fossil calibrations are critically important to divergence time estimation. We apply our new algorithms to two empirical data sets and compare the results with those obtained in previous Bayesian and likelihood analyses. The results demonstrate the utility of our new algorithms.  相似文献   

8.
In Bayesian phylogenetics, confidence in evolutionary relationships is expressed as posterior probability--the probability that a tree or clade is true given the data, evolutionary model, and prior assumptions about model parameters. Model parameters, such as branch lengths, are never known in advance; Bayesian methods incorporate this uncertainty by integrating over a range of plausible values given an assumed prior probability distribution for each parameter. Little is known about the effects of integrating over branch length uncertainty on posterior probabilities when different priors are assumed. Here, we show that integrating over uncertainty using a wide range of typical prior assumptions strongly affects posterior probabilities, causing them to deviate from those that would be inferred if branch lengths were known in advance; only when there is no uncertainty to integrate over does the average posterior probability of a group of trees accurately predict the proportion of correct trees in the group. The pattern of branch lengths on the true tree determines whether integrating over uncertainty pushes posterior probabilities upward or downward. The magnitude of the effect depends on the specific prior distributions used and the length of the sequences analyzed. Under realistic conditions, however, even extraordinarily long sequences are not enough to prevent frequent inference of incorrect clades with strong support. We found that across a range of conditions, diffuse priors--either flat or exponential distributions with moderate to large means--provide more reliable inferences than small-mean exponential priors. An empirical Bayes approach that fixes branch lengths at their maximum likelihood estimates yields posterior probabilities that more closely match those that would be inferred if the true branch lengths were known in advance and reduces the rate of strongly supported false inferences compared with fully Bayesian integration.  相似文献   

9.
We consider a set of sample counts obtained by sampling arbitrary fractions of a finite volume containing an homogeneously dispersed population of identical objects. We report a Bayesian derivation of the posterior probability distribution of the population size using a binomial likelihood and non-conjugate, discrete uniform priors under sampling with or without replacement. Our derivation yields a computationally feasible formula that can prove useful in a variety of statistical problems involving absolute quantification under uncertainty. We implemented our algorithm in the R package dupiR and compared it with a previously proposed Bayesian method based on a Gamma prior. As a showcase, we demonstrate that our inference framework can be used to estimate bacterial survival curves from measurements characterized by extremely low or zero counts and rather high sampling fractions. All in all, we provide a versatile, general purpose algorithm to infer population sizes from count data, which can find application in a broad spectrum of biological and physical problems.  相似文献   

10.
Given the relatively small number of microarrays typically used in gene-expression-based classification, all of the data must be used to train a classifier and therefore the same training data is used for error estimation. The key issue regarding the quality of an error estimator in the context of small samples is its accuracy, and this is most directly analyzed via the deviation distribution of the estimator, this being the distribution of the difference between the estimated and true errors. Past studies indicate that given a prior set of features, cross-validation does not perform as well in this regard as some other training-data-based error estimators. The purpose of this study is to quantify the degree to which feature selection increases the variation of the deviation distribution in addition to the variation in the absence of feature selection. To this end, we propose the coefficient of relative increase in deviation dispersion (CRIDD), which gives the relative increase in the deviation-distribution variance using feature selection as opposed to using an optimal feature set without feature selection. The contribution of feature selection to the variance of the deviation distribution can be significant, contributing to over half of the variance in many of the cases studied. We consider linear-discriminant analysis, 3-nearest-neighbor, and linear support vector machines for classification; sequential forward selection, sequential forward floating selection, and the -test for feature selection; and -fold and leave-one-out cross-validation for error estimation. We apply these to three feature-label models and patient data from a breast cancer study. In sum, the cross-validation deviation distribution is significantly flatter when there is feature selection, compared with the case when cross-validation is performed on a given feature set. This is reflected by the observed positive values of the CRIDD, which is defined to quantify the contribution of feature selection towards the deviation variance.[1,2,3,4,5,6,7,8,9,10,11,12,13]  相似文献   

11.
Multivariate linear models are increasingly important in quantitative genetics. In high dimensional specifications, factor analysis (FA) may provide an avenue for structuring (co)variance matrices, thus reducing the number of parameters needed for describing (co)dispersion. We describe how FA can be used to model genetic effects in the context of a multivariate linear mixed model. An orthogonal common factor structure is used to model genetic effects under Gaussian assumption, so that the marginal likelihood is multivariate normal with a structured genetic (co)variance matrix. Under standard prior assumptions, all fully conditional distributions have closed form, and samples from the joint posterior distribution can be obtained via Gibbs sampling. The model and the algorithm developed for its Bayesian implementation were used to describe five repeated records of milk yield in dairy cattle, and a one common FA model was compared with a standard multiple trait model. The Bayesian Information Criterion favored the FA model.  相似文献   

12.
The toxicity and efficacy of more than 30 anticancer agents present very high variations, depending on the dosing time. Therefore, the biologists studying the circadian rhythm require a very precise method for estimating the periodic component (PC) vector of chronobiological signals. Moreover, in recent developments, not only the dominant period or the PC vector present a crucial interest but also their stability or variability. In cancer treatment experiments, the recorded signals corresponding to different phases of treatment are short, from 7 days for the synchronization segment to 2 or 3 days for the after-treatment segment. When studying the stability of the dominant period, we have to consider very short length signals relative to the prior knowledge of the dominant period, placed in the circadian domain. The classical approaches, based on Fourier transform (FT) methods are inefficient (i.e., lack of precision) considering the particularities of the data (i.e., the short length). Another particularity of the signals considered in such experiments is the level of noise: such signals are very noisy and establishing the periodic components that are associated with the biological phenomena and distinguishing them from the ones associated with the noise are difficult tasks. In this paper, we propose a new method for the estimation of the PC vector of biomedical signals, using the biological prior informations and considering a model that accounts for the noise. The experiments developed in cancer treatment context are recording signals expressing a limited number of periods. This is a prior information that can be translated as the sparsity of the PC vector. The proposed method considers the PC vector estimation as an Inverse Problem (IP) using the general Bayesian inference in order to infer the unknown of our model, i.e. the PC vector but also the hyperparameters (i.e the variances). The sparsity prior information is modeled using a sparsity enforcing prior law. In this paper, we propose a Student’s t distribution, viewed as the marginal distribution of a bivariate normal-inverse gamma distribution. We build a general infinite Gaussian scale mixture (IGSM) hierarchical model where we assign prior distributions also for the hyperparameters. The expression of the joint posterior law of the unknown PC vector and hyperparameters is obtained via Bayes rule, and then, the unknowns are estimated via joint maximum a posteriori (JMAP) or posterior mean (PM). For the PM estimator, the expression of the posterior distribution is approximated by a separable one, via variational Bayesian approximation (VBA), using the Kullback-Leibler (KL) divergence. For the PM estimation, two possibilities are considered: an approximation with a partially separable distribution and an approximation with a fully separable one. Both resulting algorithms corresponding to the PM estimation and the one corresponding to the JMAP estimation are iterative algorithms. The algorithms are presented in detail and are compared with the ones corresponding to the Gaussian model. We examine the convergency of the algorithms and give simulation results to compare their performances. Finally, we show simulation results on synthetic and real data in cancer treatment applications. The real data considered in this paper examines the rest-activity patterns of KI/KI Per2::luc mouse, aged 10 weeks, singly housed in RealTime Biolumicorder (RT-BIO).  相似文献   

13.
14.
Wang H  West M 《Biometrika》2009,96(4):821-834
We present Bayesian analyses of matrix-variate normal data with conditional independencies induced by graphical model structuring of the characterizing covariance matrix parameters. This framework of matrix normal graphical models includes prior specifications, posterior computation using Markov chain Monte Carlo methods, evaluation of graphical model uncertainty and model structure search. Extensions to matrix-variate time series embed matrix normal graphs in dynamic models. Examples highlight questions of graphical model uncertainty, search and comparison in matrix data contexts. These models may be applied in a number of areas of multivariate analysis, time series and also spatial modelling.  相似文献   

15.
This paper describes a variational free-energy formulation of (partially observable) Markov decision problems in decision making under uncertainty. We show that optimal control can be cast as active inference. In active inference, both action and posterior beliefs about hidden states minimise a free energy bound on the negative log-likelihood of observed states, under a generative model. In this setting, reward or cost functions are absorbed into prior beliefs about state transitions and terminal states. Effectively, this converts optimal control into a pure inference problem, enabling the application of standard Bayesian filtering techniques. We then consider optimal trajectories that rest on posterior beliefs about hidden states in the future. Crucially, this entails modelling control as a hidden state that endows the generative model with a representation of agency. This leads to a distinction between models with and without inference on hidden control states; namely, agency-free and agency-based models, respectively.  相似文献   

16.
Chen MH  Ibrahim JG 《Biometrics》2000,56(3):678-685
Correlated count data arise often in practice, especially in repeated measures situations or instances in which observations are collected over time. In this paper, we consider a parametric model for a time series of counts by constructing a likelihood-based version of a model similar to that of Zeger (1988, Biometrika 75, 621-629). The model has the advantage of incorporating both overdispersion and autocorrelation. We consider a Bayesian approach and propose a class of informative prior distributions for the model parameters that are useful for prediction. The prior specification is motivated from the notion of the existence of data from similar previous studies, called historical data, which is then quantified into a prior distribution for the current study. We derive the Bayesian predictive distribution and use a Bayesian criterion, called the predictive L measure, for assessing the predictions for a given time series model. The distribution of the predictive L measure is also derived, which will enable us to compare the predictive ability for each model under consideration. Our methodology is motivated by a real data set involving yearly pollen counts, which is examined in some detail.  相似文献   

17.
Using the concept of an extended data set (Zellner, 1986), we derived the projection or hat matrix for Bayesian regression analysis. The hat matrix shows how much influence or leverage the observed responses and the prior means have on each of the posterior fitted values. The amount of leverage associated with the observed data is shown to be a monotonically decreasing function of the ratio of the process variance to the prior variance. Additional properties of the Bayesian hat matrix are discussed. Two illustrative examples are presented.  相似文献   

18.
19.
Considerable effort has been devoted to the estimation of species interaction strengths. This effort has focused primarily on statistical significance testing and obtaining point estimates of parameters that contribute to interaction strength magnitudes, leaving the characterization of uncertainty associated with those estimates unconsidered. We consider a means of characterizing the uncertainty of a generalist predator’s interaction strengths by formulating an observational method for estimating a predator’s prey-specific per capita attack rates as a Bayesian statistical model. This formulation permits the explicit incorporation of multiple sources of uncertainty. A key insight is the informative nature of several so-called non-informative priors that have been used in modeling the sparse data typical of predator feeding surveys. We introduce to ecology a new neutral prior and provide evidence for its superior performance. We use a case study to consider the attack rates in a New Zealand intertidal whelk predator, and we illustrate not only that Bayesian point estimates can be made to correspond with those obtained by frequentist approaches, but also that estimation uncertainty as described by 95% intervals is more useful and biologically realistic using the Bayesian method. In particular, unlike in bootstrap confidence intervals, the lower bounds of the Bayesian posterior intervals for attack rates do not include zero when a predator–prey interaction is in fact observed. We conclude that the Bayesian framework provides a straightforward, probabilistic characterization of interaction strength uncertainty, enabling future considerations of both the deterministic and stochastic drivers of interaction strength and their impact on food webs.  相似文献   

20.
Summary We examine situations where interest lies in the conditional association between outcome and exposure variables, given potential confounding variables. Concern arises that some potential confounders may not be measured accurately, whereas others may not be measured at all. Some form of sensitivity analysis might be employed, to assess how this limitation in available data impacts inference. A Bayesian approach to sensitivity analysis is straightforward in concept: a prior distribution is formed to encapsulate plausible relationships between unobserved and observed variables, and posterior inference about the conditional exposure–disease relationship then follows. In practice, though, it can be challenging to form such a prior distribution in both a realistic and simple manner. Moreover, it can be difficult to develop an attendant Markov chain Monte Carlo (MCMC) algorithm that will work effectively on a posterior distribution arising from a highly nonidentified model. In this article, a simple prior distribution for acknowledging both poorly measured and unmeasured confounding variables is developed. It requires that only a small number of hyperparameters be set by the user. Moreover, a particular computational approach for posterior inference is developed, because application of MCMC in a standard manner is seen to be ineffective in this problem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号