Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data   总被引:1,自引:0,他引:1  
Summary .  We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.  相似文献   

Statistics in Biosciences - Joint models for longitudinal biomarkers and time-to-event data are widely used in longitudinal studies. Many joint modeling approaches have been proposed to handle...  相似文献   

Summary .  The majority of the statistical literature for the joint modeling of longitudinal and time-to-event data has focused on the development of models that aim at capturing specific aspects of the motivating case studies. However, little attention has been given to the development of diagnostic and model-assessment tools. The main difficulty in using standard model diagnostics in joint models is the nonrandom dropout in the longitudinal outcome caused by the occurrence of events. In particular, the reference distribution of statistics, such as the residuals, in missing data settings is not directly available and complex calculations are required to derive it. In this article, we propose a multiple-imputation-based approach for creating multiple versions of the completed data set under the assumed joint model. Residuals and diagnostic plots for the complete data model can then be calculated based on these imputed data sets. Our proposals are exemplified using two real data sets.  相似文献   

Summary We consider a clinical trial with a primary and a secondary endpoint where the secondary endpoint is tested only if the primary endpoint is significant. The trial uses a group sequential procedure with two stages. The familywise error rate (FWER) of falsely concluding significance on either endpoint is to be controlled at a nominal level α. The type I error rate for the primary endpoint is controlled by choosing any α‐level stopping boundary, e.g., the standard O'Brien–Fleming or the Pocock boundary. Given any particular α‐level boundary for the primary endpoint, we study the problem of determining the boundary for the secondary endpoint to control the FWER. We study this FWER analytically and numerically and find that it is maximized when the correlation coefficient ρ between the two endpoints equals 1. For the four combinations consisting of O'Brien–Fleming and Pocock boundaries for the primary and secondary endpoints, the critical constants required to control the FWER are computed for different values of ρ. An ad hoc boundary is proposed for the secondary endpoint to address a practical concern that may be at issue in some applications. Numerical studies indicate that the O'Brien–Fleming boundary for the primary endpoint and the Pocock boundary for the secondary endpoint generally gives the best primary as well as secondary power performance. The Pocock boundary may be replaced by the ad hoc boundary for the secondary endpoint with a very little loss of secondary power if the practical concern is at issue. A clinical trial example is given to illustrate the methods.  相似文献   

Lei Xu  Jun Shao 《Biometrics》2009,65(4):1175-1183
Summary In studies with longitudinal or panel data, missing responses often depend on values of responses through a subject‐level unobserved random effect. Besides the likelihood approach based on parametric models, there exists a semiparametric method, the approximate conditional model (ACM) approach, which relies on the availability of a summary statistic and a linear or polynomial approximation to some random effects. However, two important issues must be addressed in applying ACM. The first is how to find a summary statistic and the second is how to estimate the parameters in the original model using estimates of parameters in ACM. Our study is to address these two issues. For the first issue, we derive summary statistics under various situations. For the second issue, we propose to use a grouping method, instead of linear or polynomial approximation to random effects. Because the grouping method is a moment‐based approach, the conditions we assumed in deriving summary statistics are weaker than the existing ones in the literature. When the derived summary statistic is continuous, we propose to use a classification tree method to obtain an approximate summary statistic for grouping. Some simulation results are presented to study the finite sample performance of the proposed method. An application is illustrated using data from the study of Modification of Diet in Renal Disease.  相似文献   

Cai J  Sen PK  Zhou H 《Biometrics》1999,55(1):182-189
A random effects model for analyzing multivariate failure time data is proposed. The work is motivated by the need for assessing the mean treatment effect in a multicenter clinical trial study, assuming that the centers are a random sample from an underlying population. An estimating equation for the mean hazard ratio parameter is proposed. The proposed estimator is shown to be consistent and asymptotically normally distributed. A variance estimator, based on large sample theory, is proposed. Simulation results indicate that the proposed estimator performs well in finite samples. The proposed variance estimator effectively corrects the bias of the naive variance estimator, which assumes independence of individuals within a group. The methodology is illustrated with a clinical trial data set from the Studies of Left Ventricular Dysfunction. This shows that the variability of the treatment effect is higher than found by means of simpler models.  相似文献   

Summary .  We consider semiparametric transition measurement error models for longitudinal data, where one of the covariates is measured with error in transition models, and no distributional assumption is made for the underlying unobserved covariate. An estimating equation approach based on the pseudo conditional score method is proposed. We show the resulting estimators of the regression coefficients are consistent and asymptotically normal. We also discuss the issue of efficiency loss. Simulation studies are conducted to examine the finite-sample performance of our estimators. The longitudinal AIDS Costs and Services Utilization Survey data are analyzed for illustration.  相似文献   

Summary A class of nonignorable models is presented for handling nonmonotone missingness in categorical longitudinal responses. This class of models includes the traditional selection models and shared parameter models. This allows us to perform a broader than usual sensitivity analysis. In particular, instead of considering variations to a chosen nonignorable model, we study sensitivity between different missing data frameworks. An appealing feature of the developed class is that parameters with a marginal interpretation are obtained, while algebraically simple models are considered. Specifically, marginalized mixed‐effects models ( Heagerty, 1999 , Biometrics 55, 688–698) are used for the longitudinal process that model separately the marginal mean and the correlation structure. For the correlation structure, random effects are introduced and their distribution is modeled either parametrically or non‐parametrically to avoid potential misspecifications.  相似文献   



The University of Wisconsin Population Health Institute has published the County Health Rankings since 2010. These rankings use population-based data to highlight health outcomes and the multiple determinants of these outcomes and to encourage in-depth health assessment for all United States counties. A significant methodological limitation, however, is the uncertainty of rank estimates, particularly for small counties. To address this challenge, we explore the use of longitudinal and pooled outcome data in hierarchical Bayesian models to generate county ranks with greater precision.


In our models we used pooled outcome data for three measure groups: (1) Poor physical and poor mental health days; (2) percent of births with low birth weight and fair or poor health prevalence; and (3) age-specific mortality rates for nine age groups. We used the fixed and random effects components of these models to generate posterior samples of rates for each measure. We also used time-series data in longitudinal random effects models for age-specific mortality. Based on the posterior samples from these models, we estimate ranks and rank quartiles for each measure, as well as the probability of a county ranking in its assigned quartile. Rank quartile probabilities for univariate, joint outcome, and/or longitudinal models were compared to assess improvements in rank precision.


The joint outcome model for poor physical and poor mental health days resulted in improved rank precision, as did the longitudinal model for age-specific mortality rates. Rank precision for low birth weight births and fair/poor health prevalence based on the univariate and joint outcome models were equivalent.


Incorporating longitudinal or pooled outcome data may improve rank certainty, depending on characteristics of the measures selected. For measures with different determinants, joint modeling neither improved nor degraded rank precision. This approach suggests a simple way to use existing information to improve the precision of small-area measures of population health.  相似文献   

Dobson A  Henderson R 《Biometrics》2003,59(4):741-751
We present a variety of informal graphical procedures for diagnostic assessment of joint models for longitudinal and dropout time data. A random effects approach for Gaussian responses and proportional hazards dropout time is assumed. We consider preliminary assessment of dropout classification categories based on residuals following a standard longitudinal data analysis with no allowance for informative dropout. Residual properties conditional upon dropout information are discussed and case influence is considered. The proposed methods do not require computationally intensive methods over and above those used to fit the proposed model. A longitudinal trial into the treatment of schizophrenia is used to illustrate the suggestions.  相似文献   

Data from a litter matched tumorigenesis experiment are analysed using a generalised linear mixed model (GLMM) approach to the analysis of clustered survival data in which there is a dependence of failure time observations within the same litter. Maximum likelihood (ML) and residual maximum likelihood (REML) estimates of risk variable parameters, variance component parameters and the prediction of random effects are given. Estimation of treatment effect parameter (carcinogen effect) has good agreement with previous analyses obtained in the literature though the dependence structure within a litter is modelled in different ways. The variance component estimation provides the estimated dispersion of the random effects. The prediction of random effects, is useful, for instance, in identifying high risk litters and individuals. The present analysis illustrates its wider application to detecting increased risk of occurrence of disease in particular families of a study population.  相似文献   

Analysis of longitudinal data with excessive zeros has gained increasing attention in recent years; however, current approaches to the analysis of longitudinal data with excessive zeros have primarily focused on balanced data. Dropouts are common in longitudinal studies; therefore, the analysis of the resulting unbalanced data is complicated by the missing mechanism. Our study is motivated by the analysis of longitudinal skin cancer count data presented by Greenberg, Baron, Stukel, Stevens, Mandel, Spencer, Elias, Lowe, Nierenberg, Bayrd, Vance, Freeman, Clendenning, Kwan, and the Skin Cancer Prevention Study Group[New England Journal of Medicine 323 , 789–795]. The data consist of a large number of zero responses (83% of the observations) as well as a substantial amount of dropout (about 52% of the observations). To account for both excessive zeros and dropout patterns, we propose a pattern‐mixture zero‐inflated model with compound Poisson random effects for the unbalanced longitudinal skin cancer data. We also incorporate an autoregressive of order 1 correlation structure in the model to capture longitudinal correlation of the count responses. A quasi‐likelihood approach has been developed in the estimation of our model. We illustrated the method with analysis of the longitudinal skin cancer data.  相似文献   

We consider a conceptual correspondence between the missing data setting, and joint modeling of longitudinal and time‐to‐event outcomes. Based on this, we formulate an extended shared random effects joint model. Based on this, we provide a characterization of missing at random, which is in line with that in the missing data setting. The ideas are illustrated using data from a study on liver cirrhosis, contrasting the new framework with conventional joint models.  相似文献   

本文研究H广义线性模型中未知参数的两种估计方法,一种是边际似然函数法,另一种是Lee和Nelder提出来的L-N法.对于一类具有两个随机效应的典型的Poisson-Gamma类模型,在一些正则性条件之下,我们已经证明了其中固定效应卢的L-N估计的强相合性及渐近正态性,并得到了其收敛于真值的速度.针对这类模型,本文进一步给出了其边际似然函数的解析表达式,并且通过Monte Carlo模拟,对模型中固定效应β的边际似然估计和L—N估计进行了比较,模拟表明L—N估计比边际似然估计在拟Poisson-Gamma模型中有着更加优良的表现,具有更高的精度。  相似文献   

The problem of combining information from separate trials is a key consideration when performing a meta‐analysis or planning a multicentre trial. Although there is a considerable journal literature on meta‐analysis based on individual patient data (IPD), i.e. a one‐step IPD meta‐analysis, versus analysis based on summary data, i.e. a two‐step IPD meta‐analysis, recent articles in the medical literature indicate that there is still confusion and uncertainty as to the validity of an analysis based on aggregate data. In this study, we address one of the central statistical issues by considering the estimation of a linear function of the mean, based on linear models for summary data and for IPD. The summary data from a trial is assumed to comprise the best linear unbiased estimator, or maximum likelihood estimator of the parameter, along with its covariance matrix. The setup, which allows for the presence of random effects and covariates in the model, is quite general and includes many of the commonly employed models, for example, linear models with fixed treatment effects and fixed or random trial effects. For this general model, we derive a condition under which the one‐step and two‐step IPD meta‐analysis estimators coincide, extending earlier work considerably. The implications of this result for the specific models mentioned above are illustrated in detail, both theoretically and in terms of two real data sets, and the roles of balance and heterogeneity are highlighted. Our analysis also shows that when covariates are present, which is typically the case, the two estimators coincide only under extra simplifying assumptions, which are somewhat unrealistic in practice.  相似文献   

Variable Selection for Semiparametric Mixed Models in Longitudinal Studies   总被引:2,自引:0,他引:2  
Summary .  We propose a double-penalized likelihood approach for simultaneous model selection and estimation in semiparametric mixed models for longitudinal data. Two types of penalties are jointly imposed on the ordinary log-likelihood: the roughness penalty on the nonparametric baseline function and a nonconcave shrinkage penalty on linear coefficients to achieve model sparsity. Compared to existing estimation equation based approaches, our procedure provides valid inference for data with missing at random, and will be more efficient if the specified model is correct. Another advantage of the new procedure is its easy computation for both regression components and variance parameters. We show that the double-penalized problem can be conveniently reformulated into a linear mixed model framework, so that existing software can be directly used to implement our method. For the purpose of model inference, we derive both frequentist and Bayesian variance estimation for estimated parametric and nonparametric components. Simulation is used to evaluate and compare the performance of our method to the existing ones. We then apply the new method to a real data set from a lactation study.  相似文献   

This paper discusses regression analysis of longitudinal data in which the observation process may be related to the longitudinal process of interest. Such data have recently attracted a great deal of attention and some methods have been developed. However, most of those methods treat the observation process as a recurrent event process, which assumes that one observation can immediately follow another. Sometimes, this is not the case, as there may be some delay or observation duration. Such a process is often referred to as a recurrent episode process. One example is the medical cost related to hospitalization, where each hospitalization serves as a single observation. For the problem, we present a joint analysis approach for regression analysis of both longitudinal and observation processes and a simulation study is conducted that assesses the finite sample performance of the approach. The asymptotic properties of the proposed estimates are also given and the method is applied to the medical cost data that motivated this study.  相似文献   

Continuous proportional data is common in biomedical research, e.g., the pre‐post therapy percent change in certain physiological and molecular variables such as glomerular filtration rate, certain gene expression level, or telomere length. As shown in (Song and Tan, 2000) such data requires methods beyond the common generalised linear models. However, the original marginal simplex model of (Song and Tan, 2000) for such longitudinal continuous proportional data assumes a constant dispersion parameter. This assumption of dispersion homogeneity is imposed mainly for mathematical convenience and may be violated in some situations. For example, the dispersion may vary in terms of drug treatment cohorts or follow‐up times. This paper extends their original model so that the heterogeneity of the dispersion parameter can be assessed and accounted for in order to conduct a proper statistical inference for the model parameters. A simulation study is given to demonstrate that statistical inference can be seriously affected by mistakenly assuming a varying dispersion parameter to be constant in the application of the available GEEs method. In addition, residual analysis is developed for checking various assumptions made in the modelling process, e.g., assumptions on error distribution. The methods are illustrated with the same eye surgery data in (Song and Tan, 2000) for ease of comparison. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)  相似文献   

