首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Kinney SK  Dunson DB 《Biometrics》2007,63(3):690-698
We address the problem of selecting which variables should be included in the fixed and random components of logistic mixed effects models for correlated data. A fully Bayesian variable selection is implemented using a stochastic search Gibbs sampler to estimate the exact model-averaged posterior distribution. This approach automatically identifies subsets of predictors having nonzero fixed effect coefficients or nonzero random effects variance, while allowing uncertainty in the model selection process. Default priors are proposed for the variance components and an efficient parameter expansion Gibbs sampler is developed for posterior computation. The approach is illustrated using simulated data and an epidemiologic example.  相似文献   

2.
Microarrays provide a valuable tool for the quantification of gene expression. Usually, however, there is a limited number of replicates leading to unsatisfying variance estimates in a gene‐wise mixed model analysis. As thousands of genes are available, it is desirable to combine information across genes. When more than two tissue types or treatments are to be compared it might be advisable to consider the array effect as random. Then information between arrays may be recovered, which can increase accuracy in estimation. We propose a method of variance component estimation across genes for a linear mixed model with two random effects. The method may be extended to models with more than two random effects. We assume that the variance components follow a log‐normal distribution. Assuming that the sums of squares from the gene‐wise analysis, given the true variance components, follow a scaled χ2‐distribution, we adopt an empirical Bayes approach. The variance components are estimated by the expectation of their posterior distribution. The new method is evaluated in a simulation study. Differentially expressed genes are more likely to be detected by tests based on these variance estimates than by tests based on gene‐wise variance estimates. This effect is most visible in studies with small array numbers. Analyzing a real data set on maize endosperm the method is shown to work well. (© 2008 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

3.
Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data   总被引:1,自引:0,他引:1  
Summary .  We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.  相似文献   

4.
Statistical inference for microarray experiments usually involves the estimation of error variance for each gene. Because the sample size available for each gene is often low, the usual unbiased estimator of the error variance can be unreliable. Shrinkage methods, including empirical Bayes approaches that borrow information across genes to produce more stable estimates, have been developed in recent years. Because the same microarray platform is often used for at least several experiments to study similar biological systems, there is an opportunity to improve variance estimation further by borrowing information not only across genes but also across experiments. We propose a lognormal model for error variances that involves random gene effects and random experiment effects. Based on the model, we develop an empirical Bayes estimator of the error variance for each combination of gene and experiment and call this estimator BAGE because information is Borrowed Across Genes and Experiments. A permutation strategy is used to make inference about the differential expression status of each gene. Simulation studies with data generated from different probability models and real microarray data show that our method outperforms existing approaches.  相似文献   

5.
Codon-based substitution models are routinely used to measure selective pressures acting on protein-coding genes. To this effect, the nonsynonymous to synonymous rate ratio (dN/dS = omega) is estimated. The proportion of amino-acid sites potentially under positive selection, as indicated by omega > 1, is inferred by fitting a probability distribution where some sites are permitted to have omega > 1. These sites are then inferred by means of an empirical Bayes or by a Bayes empirical Bayes approach that, respectively, ignores or accounts for sampling errors in maximum-likelihood estimates of the distribution used to infer the proportion of sites with omega > 1. Here, we extend a previous full-Bayes approach to include models with high power and low false-positive rates when inferring sites under positive selection. We propose some heuristics to alleviate the computational burden, and show that (i) full Bayes can be superior to empirical Bayes when analyzing a small data set or small simulated data, (ii) full Bayes has only a small advantage over Bayes empirical Bayes with our small test data, and (iii) Bayesian methods appear relatively insensitive to mild misspecifications of the random process generating adaptive evolution in our simulations, but in practice can prove extremely sensitive to model specification. We suggest that the codon model used to detect amino acids under selection should be carefully selected, for instance using Akaike information criterion (AIC).  相似文献   

6.
Summary It is of great practical interest to simultaneously identify the important predictors that correspond to both the fixed and random effects components in a linear mixed‐effects (LME) model. Typical approaches perform selection separately on each of the fixed and random effect components. However, changing the structure of one set of effects can lead to different choices of variables for the other set of effects. We propose simultaneous selection of the fixed and random factors in an LME model using a modified Cholesky decomposition. Our method is based on a penalized joint log likelihood with an adaptive penalty for the selection and estimation of both the fixed and random effects. It performs model selection by allowing fixed effects or standard deviations of random effects to be exactly zero. A constrained expectation–maximization algorithm is then used to obtain the final estimates. It is further shown that the proposed penalized estimator enjoys the Oracle property, in that, asymptotically it performs as well as if the true model was known beforehand. We demonstrate the performance of our method based on a simulation study and a real data example.  相似文献   

7.
Volinsky CT  Raftery AE 《Biometrics》2000,56(1):256-262
We investigate the Bayesian Information Criterion (BIC) for variable selection in models for censored survival data. Kass and Wasserman (1995, Journal of the American Statistical Association 90, 928-934) showed that BIC provides a close approximation to the Bayes factor when a unit-information prior on the parameter space is used. We propose a revision of the penalty term in BIC so that it is defined in terms of the number of uncensored events instead of the number of observations. For a simple censored data model, this revision results in a better approximation to the exact Bayes factor based on a conjugate unit-information prior. In the Cox proportional hazards regression model, we propose defining BIC in terms of the maximized partial likelihood. Using the number of deaths rather than the number of individuals in the BIC penalty term corresponds to a more realistic prior on the parameter space and is shown to improve predictive performance for assessing stroke risk in the Cardiovascular Health Study.  相似文献   

8.
In the context of analyzing multiple functional limitation responses collected longitudinally from the Longitudinal Study of Aging (LSOA), we investigate the heterogeneity of these outcomes with respect to their associations with previous functional status and other risk factors in the presence of informative drop-out and confounding by baseline outcomes. We accommodate the longitudinal nature of the multiple outcomes with a unique extension of the nested random effects logistic model with an autoregressive structure to include drop-out and baseline outcome components with shared random effects. Estimation of fixed effects and variance components is by maximum likelihood with numerical integration. This shared parameter selection model assumes that drop-out is conditionally independent of the multiple functional limitation outcomes given the underlying random effect representing an individual's trajectory of functional status across time. Whereas it is not possible to fully assess the adequacy of this assumption, we assess the robustness of this approach by varying the assumptions underlying the proposed model such as the random effects structure, the drop-out component, and omission of baseline functional outcomes as dependent variables in the model. Heterogeneity among the associations between each functional limitation outcome and a set of risk factors for functional limitation, such as previous functional limitation and physical activity, exists for the LSOA data of interest. Less heterogeneity is observed among the estimates of time-level random effects variance components that are allowed to vary across functional outcomes and time. We also note that. under an autoregressive structure, bias results from omitting the baseline outcome component linked to the follow-up outcome component by subject-level random effects.  相似文献   

9.

Background

LASSO is a penalized regression method that facilitates model fitting in situations where there are as many, or even more explanatory variables than observations, and only a few variables are relevant in explaining the data. We focus on the Bayesian version of LASSO and consider four problems that need special attention: (i) controlling false positives, (ii) multiple comparisons, (iii) collinearity among explanatory variables, and (iv) the choice of the tuning parameter that controls the amount of shrinkage and the sparsity of the estimates. The particular application considered is association genetics, where LASSO regression can be used to find links between chromosome locations and phenotypic traits in a biological organism. However, the proposed techniques are relevant also in other contexts where LASSO is used for variable selection.

Results

We separate the true associations from false positives using the posterior distribution of the effects (regression coefficients) provided by Bayesian LASSO. We propose to solve the multiple comparisons problem by using simultaneous inference based on the joint posterior distribution of the effects. Bayesian LASSO also tends to distribute an effect among collinear variables, making detection of an association difficult. We propose to solve this problem by considering not only individual effects but also their functionals (i.e. sums and differences). Finally, whereas in Bayesian LASSO the tuning parameter is often regarded as a random variable, we adopt a scale space view and consider a whole range of fixed tuning parameters, instead. The effect estimates and the associated inference are considered for all tuning parameters in the selected range and the results are visualized with color maps that provide useful insights into data and the association problem considered. The methods are illustrated using two sets of artificial data and one real data set, all representing typical settings in association genetics.  相似文献   

10.
Qu P  Qu Y 《Biometrics》2000,56(4):1249-1255
After continued treatment with an insecticide, within the population of the susceptible insects, resistant strains will occur. It is important to know whether there are any resistant strains, what the proportions are, and what the median lethal doses are for the insecticide. Lwin and Martin (1989, Biometrics 45, 721-732) propose a probit mixture model and use the EM algorithm to obtain the maximum likelihood estimates for the parameters. This approach has difficulties in estimating the confidence intervals and in testing the number of components. We propose a Bayesian approach to obtaining the credible intervals for the location and scale of the tolerances in each component and for the mixture proportions by using data augmentation and Gibbs sampler. We use Bayes factor for model selection and determining the number of components. We illustrate the method with data published in Lwin and Martin (1989).  相似文献   

11.
For patients on dialysis, hospitalizations remain a major risk factor for mortality and morbidity. We use data from a large national database, United States Renal Data System, to model time-varying effects of hospitalization risk factors as functions of time since initiation of dialysis. To account for the three-level hierarchical structure in the data where hospitalizations are nested in patients and patients are nested in dialysis facilities, we propose a multilevel mixed effects varying coefficient model (MME-VCM) where multilevel (patient- and facility-level) random effects are used to model the dependence structure of the data. The proposed MME-VCM also includes multilevel covariates, where baseline demographics and comorbidities are among the patient-level factors, and staffing composition and facility size are among the facility-level risk factors. To address the challenge of high-dimensional integrals due to the hierarchical structure of the random effects, we propose a novel two-step approximate EM algorithm based on the fully exponential Laplace approximation. Inference for the varying coefficient functions and variance components is achieved via derivation of the standard errors using score contributions. The finite sample performance of the proposed estimation procedure is studied through simulations.  相似文献   

12.
In a linear mixed effects model, it is common practice to assume that the random effects follow a parametric distribution such as a normal distribution with mean zero. However, in the case of variable selection, substantial violation of the normality assumption can potentially impact the subset selection and result in poor interpretation and even incorrect results. In nonparametric random effects models, the random effects generally have a nonzero mean, which causes an identifiability problem for the fixed effects that are paired with the random effects. In this article, we focus on a Bayesian method for variable selection. We characterize the subject‐specific random effects nonparametrically with a Dirichlet process and resolve the bias simultaneously. In particular, we propose flexible modeling of the conditional distribution of the random effects with changes across the predictor space. The approach is implemented using a stochastic search Gibbs sampler to identify subsets of fixed effects and random effects to be included in the model. Simulations are provided to evaluate and compare the performance of our approach to the existing ones. We then apply the new approach to a real data example, cross‐country and interlaboratory rodent uterotrophic bioassay.  相似文献   

13.
Kneib T  Fahrmeir L 《Biometrics》2006,62(1):109-118
Motivated by a space-time study on forest health with damage state of trees as the response, we propose a general class of structured additive regression models for categorical responses, allowing for a flexible semiparametric predictor. Nonlinear effects of continuous covariates, time trends, and interactions between continuous covariates are modeled by penalized splines. Spatial effects can be estimated based on Markov random fields, Gaussian random fields, or two-dimensional penalized splines. We present our approach from a Bayesian perspective, with inference based on a categorical linear mixed model representation. The resulting empirical Bayes method is closely related to penalized likelihood estimation in a frequentist setting. Variance components, corresponding to inverse smoothing parameters, are estimated using (approximate) restricted maximum likelihood. In simulation studies we investigate the performance of different choices for the spatial effect, compare the empirical Bayes approach to competing methodology, and study the bias of mixed model estimates. As an application we analyze data from the forest health survey.  相似文献   

14.
Lee OE  Braun TM 《Biometrics》2012,68(2):486-493
Inference regarding the inclusion or exclusion of random effects in linear mixed models is challenging because the variance components are located on the boundary of their parameter space under the usual null hypothesis. As a result, the asymptotic null distribution of the Wald, score, and likelihood ratio tests will not have the typical χ(2) distribution. Although it has been proved that the correct asymptotic distribution is a mixture of χ(2) distributions, the appropriate mixture distribution is rather cumbersome and nonintuitive when the null and alternative hypotheses differ by more than one random effect. As alternatives, we present two permutation tests, one that is based on the best linear unbiased predictors and one that is based on the restricted likelihood ratio test statistic. Both methods involve weighted residuals, with the weights determined by the among- and within-subject variance components. The null permutation distributions of our statistics are computed by permuting the residuals both within and among subjects and are valid both asymptotically and in small samples. We examine the size and power of our tests via simulation under a variety of settings and apply our test to a published data set of chronic myelogenous leukemia patients.  相似文献   

15.
We propose a state space model for analyzing equally or unequally spaced longitudinal count data with serial correlation. With a log link function, the mean of the Poisson response variable is a nonlinear function of the fixed and random effects. The random effects are assumed to be generated from a Gaussian first order autoregression (AR(1)). In this case, the mean of the observations has a log normal distribution. We use a combination of linear and nonlinear methods to take advantage of the Gaussian process embedded in a nonlinear function. The state space model uses a modified Kalman filter recursion to estimate the mean and variance of the AR(1) random error given the previous observations. The marginal likelihood is approximated by numerically integrating out the AR(1) random error. Simulation studies with different sets of parameters show that the state space model performs well. The model is applied to Epileptic Seizure data and Primary Care Visits Data. Missing and unequally spaced observations are handled naturally with this model.  相似文献   

16.
Natural selection is typically exerted at some specific life stages. If natural selection takes place before a trait can be measured, using conventional models can cause wrong inference about population parameters. When the missing data process relates to the trait of interest, a valid inference requires explicit modeling of the missing process. We propose a joint modeling approach, a shared parameter model, to account for nonrandom missing data. It consists of an animal model for the phenotypic data and a logistic model for the missing process, linked by the additive genetic effects. A Bayesian approach is taken and inference is made using integrated nested Laplace approximations. From a simulation study we find that wrongly assuming that missing data are missing at random can result in severely biased estimates of additive genetic variance. Using real data from a wild population of Swiss barn owls Tyto alba, our model indicates that the missing individuals would display large black spots; and we conclude that genes affecting this trait are already under selection before it is expressed. Our model is a tool to correctly estimate the magnitude of both natural selection and additive genetic variance.  相似文献   

17.
A novel approach for statistical analysis of comet assay data (i.e.: tail moment) is proposed, employing public-domain statistical software, the R system. The analytical strategy takes into account that the distribution of comet assay data, like the tail moment, is usually skewed and do not follow a normal distribution. Probability distributions used to model comet assay data included: the Weibull, the exponential, the logistic, the normal, the log normal and log-logistic distribution. In this approach it was also considered that heterogeneity observed among experimental units is a random feature of the comet assay data. This statistical model can be characterized with a location parameter m(ij), a scale parameter r and a between experimental units variability parameter theta. In the logarithmic scale, the parameter m(ij) depends additively on treatment and random effects, as follows: log(m(ij)) = a0 + a1x(ij) + b(i), where exp(a0) represents approximately the mean value of the control group, exp(a1) can be interpreted as the relative risk of damage with respect to the control group, x(ij) is an indicator of experimental group and exp(b(i)) is the individual risk effects assume to follows a Gamma distribution with mean 1 and variance theta. Model selection is based on Akaike's information criteria (AIC). Real data coming from comet analysis of blood samples taken from the flounder Paralichtys orbignyanus (Teleostei: Paralichtyidae) and from samples of cells suspension obtained from the estuarine polychaeta Laeonereis acuta (Nereididae) were employed. This statistical approach showed that the comet assay data should be analyzed under a modeling framework that take into account the important features of these measurements. Model selection and heterogeneity between experimental units play central points in the analysis of these data.  相似文献   

18.
Summary Various studies have estimated covariance components as half the difference between the variance component of the sum of the variable values, for each observation, and the sum of the corresponding variable variance components. Although the variance components for the separate variables can be computed using all available data, the variance components of the sum can be computed only from those observations with records for both variables. Previous studies have suggested eliminating observations with missing data, because of possible selection bias. The effect of missing data on estimates of covariance components and genetic correlations was tested on sample beef cattle data and simulated data by randomly deleting differing proportions of records of one variable for each pair of variables analyzed. Estimates of genetic correlations computed with observations with missing data eliminated, were more accurate than estimates computed using all available data. Furthermore, when observations with missing data were included, estimates of genetic correlation far outside the parameter space were common. Therefore, this method should be used only if observations with missing data have been eliminated.  相似文献   

19.
吉林蛟河42 hm2针阔混交林样地植物种-面积关系   总被引:1,自引:0,他引:1       下载免费PDF全文
 种-面积关系是生态学中的基本问题, 其构建方式对种-面积关系的影响以及最优种-面积模型的选择仍然存在争议。该文利用吉林蛟河42 hm2针阔混交林样地数据, 分别采用巢式样方法和随机样方法建立对数模型、幂函数模型和逻辑斯蒂克模型, 并通过赤池信息量准则(AIC)检验种-面积模型优度。结果表明, 种-面积关系受到取样方法的影响, 随机样方法的拟合效果优于巢式样方法。采用随机样方法构建的幂指数模型(AIC = 89.11)和逻辑斯蒂克模型(AIC = 71.21)优于对数模型(AIC = 113.81)。根据AIC值可知, 随机样方法构建的逻辑斯蒂克模型是拟合42 hm2针阔混交林样地种-面积关系的最优模型。该研究表明: 在分析种-面积关系时不仅应考虑尺度效应, 还需注意生境变化及群落演替的影响。  相似文献   

20.
Lam KF  Lee YW  Leung TL 《Biometrics》2002,58(2):316-323
In this article, the focus is on the analysis of multivariate survival time data with various types of dependence structures. Examples of multivariate survival data include clustered data and repeated measurements from the same subject, such as the interrecurrence times of cancer tumors. A random effect semiparametric proportional odds model is proposed as an alternative to the proportional hazards model. The distribution of the random effects is assumed to be multivariate normal and the random effect is assumed to act additively to the baseline log-odds function. This class of models, which includes the usual shared random effects model, the additive variance components model, and the dynamic random effects model as special cases, is highly flexible and is capable of modeling a wide range of multivariate survival data. A unified estimation procedure is proposed to estimate the regression and dependence parameters simultaneously by means of a marginal-likelihood approach. Unlike the fully parametric case, the regression parameter estimate is not sensitive to the choice of correlation structure of the random effects. The marginal likelihood is approximated by the Monte Carlo method. Simulation studies are carried out to investigate the performance of the proposed method. The proposed method is applied to two well-known data sets, including clustered data and recurrent event times data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号