首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Wang L  Zhou J  Qu A 《Biometrics》2012,68(2):353-360
We consider the penalized generalized estimating equations (GEEs) for analyzing longitudinal data with high-dimensional covariates, which often arise in microarray experiments and large-scale health studies. Existing high-dimensional regression procedures often assume independent data and rely on the likelihood function. Construction of a feasible joint likelihood function for high-dimensional longitudinal data is challenging, particularly for correlated discrete outcome data. The penalized GEE procedure only requires specifying the first two marginal moments and a working correlation structure. We establish the asymptotic theory in a high-dimensional framework where the number of covariates p(n) increases as the number of clusters n increases, and p(n) can reach the same order as n. One important feature of the new procedure is that the consistency of model selection holds even if the working correlation structure is misspecified. We evaluate the performance of the proposed method using Monte Carlo simulations and demonstrate its application using a yeast cell-cycle gene expression data set.  相似文献   

2.
3.
Mancl and DeRouen (2001, Biometrics57, 126-134) and Kauermann and Carroll (2001, JASA96, 1387-1398) proposed alternative bias-corrected covariance estimators for generalized estimating equations parameter estimates of regression models for marginal means. The finite sample properties of these estimators are compared to those of the uncorrected sandwich estimator that underestimates variances in small samples. Although the formula of Mancl and DeRouen generally overestimates variances, it often leads to coverage of 95% confidence intervals near the nominal level even in some situations with as few as 10 clusters. An explanation for these seemingly contradictory results is that the tendency to undercoverage resulting from the substantial variability of sandwich estimators counteracts the impact of overcorrecting the bias. However, these positive results do not generally hold; for small cluster sizes (e.g., <10) their estimator often results in overcoverage, and the bias-corrected covariance estimator of Kauermann and Carroll may be preferred. The methods are illustrated using data from a nested cross-sectional cluster intervention trial on reducing underage drinking.  相似文献   

4.
Neuhaus JM 《Biometrics》2002,58(3):675-683
Misclassified clustered and longitudinal data arise in studies where the response indicates a condition identified through an imperfect diagnostic procedure. Examples include longitudinal studies that use an imperfect diagnostic test to assess whether or not an individual has been infected with a specific virus. This article presents methods to implement both population-averaged and cluster-specific analyses of such data when the misclassification rates are known. The methods exploit the fact that the class of generalized linear models enjoys a closure property in the case of misclassified responses. Data from longitudinal studies of infectious disease will illustrate the findings.  相似文献   

5.
Akaike's information criterion in generalized estimating equations   总被引:15,自引:0,他引:15  
Pan W 《Biometrics》2001,57(1):120-125
Correlated response data are common in biomedical studies. Regression analysis based on the generalized estimating equations (GEE) is an increasingly important method for such data. However, there seem to be few model-selection criteria available in GEE. The well-known Akaike Information Criterion (AIC) cannot be directly applied since AIC is based on maximum likelihood estimation while GEE is nonlikelihood based. We propose a modification to AIC, where the likelihood is replaced by the quasi-likelihood and a proper adjustment is made for the penalty term. Its performance is investigated through simulation studies. For illustration, the method is applied to a real data set.  相似文献   

6.
Analysis of clustered data: A combined estimating equations approach   总被引:2,自引:0,他引:2  
  相似文献   

7.
8.
Oman SD  Landsman V  Carmel Y  Kadmon R 《Biometrics》2007,63(3):892-900
We estimate the relation between binary responses and corresponding covariate vectors, both observed over a large spatial lattice. We assume a hierarchical generalized linear model with probit link function, partition the lattice into blocks, and adopt the working assumption of independence between the blocks to obtain an easily solved estimating equation. Standard errors are obtained using the "sandwich" estimator together with window subsampling (Sherman, 1996, Journal of the Royal Statistical Society, Series B58, 509-523). We apply this to a large data set describing long-term vegetation growth, together with two other approximate-likelihood approaches: pairwise composite likelihood (CL) and estimation under a working assumption of independence. The independence and CL methods give similar point estimates and standard errors, while the independent-block approach gives considerably smaller standard errors, as well as more easily interpretable point estimates. We present numerical evidence suggesting this increased efficiency may hold more generally.  相似文献   

9.
Longitudinal studies are often applied in biomedical research and clinical trials to evaluate the treatment effect. The association pattern within the subject must be considered in both sample size calculation and the analysis. One of the most important approaches to analyze such a study is the generalized estimating equation (GEE) proposed by Liang and Zeger, in which “working correlation structure” is introduced and the association pattern within the subject depends on a vector of association parameters denoted by ρ. The explicit sample size formulas for two‐group comparison in linear and logistic regression models are obtained based on the GEE method by Liu and Liang. For cluster randomized trials (CRTs), researchers proposed the optimal sample sizes at both the cluster and individual level as a function of sampling costs and the intracluster correlation coefficient (ICC). In these approaches, the optimal sample sizes depend strongly on the ICC. However, the ICC is usually unknown for CRTs and multicenter trials. To overcome this shortcoming, Van Breukelen et al. consider a range of possible ICC values identified from literature reviews and present Maximin designs (MMDs) based on relative efficiency (RE) and efficiency under budget and cost constraints. In this paper, the optimal sample size and number of repeated measurements using GEE models with an exchangeable working correlation matrix is proposed under the considerations of fixed budget, where “optimal” refers to maximum power for a given sampling budget. The equations of sample size and number of repeated measurements for a known parameter value ρ are derived and a straightforward algorithm for unknown ρ is developed. Applications in practice are discussed. We also discuss the existence of the optimal design when an AR(1) working correlation matrix is assumed. Our proposed method can be extended under the scenarios when the true and working correlation matrix are different.  相似文献   

10.
This paper considers the impact of bias in the estimation of the association parameters for longitudinal binary responses when there are drop-outs. A number of different estimating equation approaches are considered for the case where drop-out cannot be assumed to be a completely random process. In particular, standard generalized estimating equations (GEE), GEE based on conditional residuals, GEE based on multivariate normal estimating equations for the covariance matrix, and second-order estimating equations (GEE2) are examined. These different GEE estimators are compared in terms of finite sample and asymptotic bias under a variety of drop-out processes. Finally, the relationship between bias in the estimation of the association parameters and bias in the estimation of the mean parameters is explored.  相似文献   

11.
Yi GY  He W 《Biometrics》2009,65(2):618-625
Summary .  Recently, median regression models have received increasing attention. When continuous responses follow a distribution that is quite different from a normal distribution, usual mean regression models may fail to produce efficient estimators whereas median regression models may perform satisfactorily. In this article, we discuss using median regression models to deal with longitudinal data with dropouts. Weighted estimating equations are proposed to estimate the median regression parameters for incomplete longitudinal data, where the weights are determined by modeling the dropout process. Consistency and the asymptotic distribution of the resultant estimators are established. The proposed method is used to analyze a longitudinal data set arising from a controlled trial of HIV disease ( Volberding et al., 1990 , The New England Journal of Medicine 322, 941–949). Simulation studies are conducted to assess the performance of the proposed method under various situations. An extension to estimation of the association parameters is outlined.  相似文献   

12.
Marginal models for longitudinal continuous proportional data   总被引:5,自引:0,他引:5  
Song PX  Tan M 《Biometrics》2000,56(2):496-502
Summary. Continuous proportional data arise when the response of interest is a percentage between zero and one, e.g., the percentage of decrease in renal function at different follow‐up times from the baseline. In this paper, we propose methods to directly model the marginal means of the longitudinal proportional responses using the simplex distribution of Barndorff‐Nielsen and Jørgensen that takes into account the fact that such responses are percentages restricted between zero and one and may as well have large dispersion. Parameters in such a marginal model are estimated using an extended version of the generalized estimating equations where the score vector is a nonlinear function of the observed response. The method is illustrated with an ophthalmology study on the use of intraocular gas in retinal repair surgeries.  相似文献   

13.
Within behavioural research, non‐normally distributed data with a complicated structure are common. For instance, data can represent repeated observations of quantities on the same individual. The regression analysis of such data is complicated both by the interdependency of the observations (response variables) and by their non‐normal distribution. Over the last decade, such data have been more and more frequently analysed using generalized mixed‐effect models. Some researchers invoke the heavy machinery of mixed‐effect modelling to obtain the desired population‐level (marginal) inference, which can be achieved by using simpler tools—namely by marginal models. This paper highlights marginal modelling (using generalized estimating equations [GEE]) as an alternative method. In various situations, GEE can be based on fewer assumptions and directly generate estimates (population‐level parameters) which are of immediate interest to the behavioural researcher (such as population means). Using four examples from behavioural research, we demonstrate the use, advantages, and limits of the GEE approach as implemented within the functions of the ‘geepack’ package in R.  相似文献   

14.
Generalized estimating equation (GEE) is widely adopted for regression modeling for longitudinal data, taking account of potential correlations within the same subjects. Although the standard GEE assumes common regression coefficients among all the subjects, such an assumption may not be realistic when there is potential heterogeneity in regression coefficients among subjects. In this paper, we develop a flexible and interpretable approach, called grouped GEE analysis, to modeling longitudinal data with allowing heterogeneity in regression coefficients. The proposed method assumes that the subjects are divided into a finite number of groups and subjects within the same group share the same regression coefficient. We provide a simple algorithm for grouping subjects and estimating the regression coefficients simultaneously, and show the asymptotic properties of the proposed estimator. The number of groups can be determined by the cross validation with averaging method. We demonstrate the proposed method through simulation studies and an application to a real data set.  相似文献   

15.
Cook RJ  Zeng L  Yi GY 《Biometrics》2004,60(3):820-828
In recent years there has been considerable research devoted to the development of methods for the analysis of incomplete data in longitudinal studies. Despite these advances, the methods used in practice have changed relatively little, particularly in the reporting of pharmaceutical trials. In this setting, perhaps the most widely adopted strategy for dealing with incomplete longitudinal data is imputation by the "last observation carried forward" (LOCF) approach, in which values for missing responses are imputed using observations from the most recently completed assessment. We examine the asymptotic and empirical bias, the empirical type I error rate, and the empirical coverage probability associated with estimators and tests of treatment effect based on the LOCF imputation strategy. We consider a setting involving longitudinal binary data with longitudinal analyses based on generalized estimating equations, and an analysis based simply on the response at the end of the scheduled follow-up. We find that for both of these approaches, imputation by LOCF can lead to substantial biases in estimators of treatment effects, the type I error rates of associated tests can be greatly inflated, and the coverage probability can be far from the nominal level. Alternative analyses based on all available data lead to estimators with comparatively small bias, and inverse probability weighted analyses yield consistent estimators subject to correct specification of the missing data process. We illustrate the differences between various methods of dealing with drop-outs using data from a study of smoking behavior.  相似文献   

16.
Toledano AY  Gatsonis C 《Biometrics》1999,55(2):488-496
We propose methods for regression analysis of repeatedly measured ordinal categorical data when there is nonmonotone missingness in these responses and when a key covariate is missing depending on observables. The methods use ordinal regression models in conjunction with generalized estimating equations (GEEs). We extend the GEE methodology to accommodate arbitrary patterns of missingness in the responses when this missingness is independent of the unobserved responses. We further extend the methodology to provide correction for possible bias when missingness in knowledge of a key covariate may depend on observables. The approach is illustrated with the analysis of data from a study in diagnostic oncology in which multiple correlated receiver operating characteristic curves are estimated and corrected for possible verification bias when the true disease status is missing depending on observables.  相似文献   

17.
18.
This article applies a simple method for settings where one has clustered data, but statistical methods are only available for independent data. We assume the statistical method provides us with a normally distributed estimate, theta, and an estimate of its variance sigma. We randomly select a data point from each cluster and apply our statistical method to this independent data. We repeat this multiple times, and use the average of the associated theta's as our estimate. An estimate of the variance is given by the average of the sigma2's minus the sample variance of the theta's. We call this procedure multiple outputation, as all "excess" data within each cluster is thrown out multiple times. Hoffman, Sen, and Weinberg (2001, Biometrika 88, 1121-1134) introduced this approach for generalized linear models when the cluster size is related to outcome. In this article, we demonstrate the broad applicability of the approach. Applications to angular data, p-values, vector parameters, Bayesian inference, genetics data, and random cluster sizes are discussed. In addition, asymptotic normality of estimates based on all possible outputations, as well as a finite number of outputations, is proven given weak conditions. Multiple outputation provides a simple and broadly applicable method for analyzing clustered data. It is especially suited to settings where methods for clustered data are impractical, but can also be applied generally as a quick and simple tool.  相似文献   

19.
The differential reinforcement of low-rate 72 seconds schedule (DRL-72) is a standard behavioral test procedure for screening potential antidepressant compounds. The protocol for the DRL-72 experiment, proposed by Evenden et al. (1993), consists of using a crossover design for the experiment and one-way ANOVA for the statistical analysis. In this paper we discuss the choice of several crossover designs for the DRL-72 experiment and propose to estimate the treatment effects using either generalized linear mixed models (GLMM) or generalized estimating equation (GEE) models for clustered binary data.  相似文献   

20.
In this paper, we develop a Gaussian estimation (GE) procedure to estimate the parameters of a regression model for correlated (longitudinal) binary response data using a working correlation matrix. A two‐step iterative procedure is proposed for estimating the regression parameters by the GE method and the correlation parameters by the method of moments. Consistency properties of the estimators are discussed. A simulation study was conducted to compare 11 estimators of the regression parameters, namely, four versions of the GE, five versions of the generalized estimating equations (GEEs), and two versions of the weighted GEE. Simulations show that (i) the Gaussian estimates have the smallest mean square error and best coverage probability if the working correlation structure is correctly specified and (ii) when the working correlation structure is correctly specified, the GE and the GEE with exchangeable correlation structure perform best as opposed to when the correlation structure is misspecified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号