首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Chen B  Zhou XH 《Biometrics》2011,67(3):830-842
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.  相似文献   

2.
Ibrahim JG  Chen MH  Lipsitz SR 《Biometrics》1999,55(2):591-596
We propose a method for estimating parameters for general parametric regression models with an arbitrary number of missing covariates. We allow any pattern of missing data and assume that the missing data mechanism is ignorable throughout. When the missing covariates are categorical, a useful technique for obtaining parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). We extend this method to continuous or mixed categorical and continuous covariates, and for arbitrary parametric regression models, by adapting a Monte Carlo version of the EM algorithm as discussed by Wei and Tanner (1990, Journal of the American Statistical Association 85, 699-704). In addition, we discuss the Gibbs sampler for sampling from the conditional distribution of the missing covariates given the observed data and show that the appropriate complete conditionals are log-concave. The log-concavity property of the conditional distributions will facilitate a straightforward implementation of the Gibbs sampler via the adaptive rejection algorithm of Gilks and Wild (1992, Applied Statistics 41, 337-348). We assume the model for the response given the covariates is an arbitrary parametric regression model, such as a generalized linear model, a parametric survival model, or a nonlinear model. We model the marginal distribution of the covariates as a product of one-dimensional conditional distributions. This allows us a great deal of flexibility in modeling the distribution of the covariates and reduces the number of nuisance parameters that are introduced in the E-step. We present examples involving both simulated and real data.  相似文献   

3.
Incomplete covariate data are a common occurrence in studies in which the outcome is survival time. Further, studies in the health sciences often give rise to correlated, possibly censored, survival data. With no missing covariate data, if the marginal distributions of the correlated survival times follow a given parametric model, then the estimates using the maximum likelihood estimating equations, naively treating the correlated survival times as independent, give consistent estimates of the relative risk parameters Lipsitz et al. 1994 50, 842-846. Now, suppose that some observations within a cluster have some missing covariates. We show in this paper that if one naively treats observations within a cluster as independent, that one can still use the maximum likelihood estimating equations to obtain consistent estimates of the relative risk parameters. This method requires the estimation of the parameters of the distribution of the covariates. We present results from a clinical trial Lipsitz and Ibrahim (1996b) 2, 5-14 with five covariates, four of which have some missing values. In the trial, the clusters are the hospitals in which the patients were treated.  相似文献   

4.
Stubbendick AL  Ibrahim JG 《Biometrics》2003,59(4):1140-1150
This article analyzes quality of life (QOL) data from an Eastern Cooperative Oncology Group (ECOG) melanoma trial that compared treatment with ganglioside vaccination to treatment with high-dose interferon. The analysis of this data set is challenging due to several difficulties, namely, nonignorable missing longitudinal responses and baseline covariates. Hence, we propose a selection model for estimating parameters in the normal random effects model with nonignorable missing responses and covariates. Parameters are estimated via maximum likelihood using the Gibbs sampler and a Monte Carlo expectation maximization (EM) algorithm. Standard errors are calculated using the bootstrap. The method allows for nonmonotone patterns of missing data in both the response variable and the covariates. We model the missing data mechanism and the missing covariate distribution via a sequence of one-dimensional conditional distributions, allowing the missing covariates to be either categorical or continuous, as well as time-varying. We apply the proposed approach to the ECOG quality-of-life data and conduct a small simulation study evaluating the performance of the maximum likelihood estimates. Our results indicate that a patient treated with the vaccine has a higher QOL score on average at a given time point than a patient treated with high-dose interferon.  相似文献   

5.
Horton NJ  Laird NM 《Biometrics》2001,57(1):34-42
This article presents a new method for maximum likelihood estimation of logistic regression models with incomplete covariate data where auxiliary information is available. This auxiliary information is extraneous to the regression model of interest but predictive of the covariate with missing data. Ibrahim (1990, Journal of the American Statistical Association 85, 765-769) provides a general method for estimating generalized linear regression models with missing covariates using the EM algorithm that is easily implemented when there is no auxiliary data. Vach (1997, Statistics in Medicine 16, 57-72) describes how the method can be extended when the outcome and auxiliary data are conditionally independent given the covariates in the model. The method allows the incorporation of auxiliary data without making the conditional independence assumption. We suggest tests of conditional independence and compare the performance of several estimators in an example concerning mental health service utilization in children. Using an artificial dataset, we compare the performance of several estimators when auxiliary data are available.  相似文献   

6.
We propose an extension to the estimating equations in generalized linear models to estimate parameters in the link function and variance structure simultaneously with regression coefficients. Rather than focusing on the regression coefficients, the purpose of these models is inference about the mean of the outcome as a function of a set of covariates, and various functionals of the mean function used to measure the effects of the covariates. A commonly used functional in econometrics, referred to as the marginal effect, is the partial derivative of the mean function with respect to any covariate, averaged over the empirical distribution of covariates in the model. We define an analogous parameter for discrete covariates. The proposed estimation method not only helps to identify an appropriate link function and to suggest an underlying distribution for a specific application but also serves as a robust estimator when no specific distribution for the outcome measure can be identified. Using Monte Carlo simulations, we show that the resulting parameter estimators are consistent. The method is illustrated with an analysis of inpatient expenditure data from a study of hospitalists.  相似文献   

7.
Efficiency of regression estimates for clustered data   总被引:1,自引:0,他引:1  
Mancl LA  Leroux BG 《Biometrics》1996,52(2):500-511
Statistical methods for clustered data, such as generalized estimating equations (GEE) and generalized least squares (GLS), require selecting a correlation or convariance structure to specify the dependence between observations within a cluster. Valid regression estimates can be obtained that do not depend on correct specification of the true correlation, but inappropriate specifications can result in a loss of efficiency. We derive general expressions for the asymptotic relative efficiency of GEE and GLS estimators under nested correlation structures. Efficiency is shown to depend on the covariate distribution, the cluster sizes, the response variable correlation, and the regression parameters. The results demonstrate that efficiency is quite sensitive to the between- and within-cluster variation of the covariates, and provide useful characterizations of models for which upper and lower efficiency bounds are attained. Efficiency losses for simple working correlation matrices, such as independence, can be large even for small to moderate correlations and cluster sizes.  相似文献   

8.
GEE with Gaussian estimation of the correlations when data are incomplete   总被引:4,自引:0,他引:4  
This paper considers a modification of generalized estimating equations (GEE) for handling missing binary response data. The proposed method uses Gaussian estimation of the correlation parameters, i.e., the estimating function that yields an estimate of the correlation parameters is obtained from the multivariate normal likelihood. The proposed method yields consistent estimates of the regression parameters when data are missing completely at random (MCAR). However, when data are missing at random (MAR), consistency may not hold. In a simulation study with repeated binary outcomes that are missing at random, the magnitude of the potential bias that can arise is examined. The results of the simulation study indicate that, when the working correlation matrix is correctly specified, the bias is almost negligible for the modified GEE. In the simulation study, the proposed modification of GEE is also compared to the standard GEE, multiple imputation, and weighted estimating equations approaches. Finally, the proposed method is illustrated using data from a longitudinal clinical trial comparing two therapeutic treatments, zidovudine (AZT) and didanosine (ddI), in patients with HIV.  相似文献   

9.
In this work we present a new method for genetic analysis of twin data which is based on generalized estimating equations and allows for analysis of various response types (e.g., continuous, binary, counts) combined with estimation of residual correlations. The new approach allows for control of covariates of any kind (e.g., continuous, counts) by modeling the dependence of mean and variance on background variables. The proposed method was applied to identify the covariates that have a significant influence on elderly people's functional abilities, and find the estimates for the correlation coefficients of residuals for MZ and DZ twins in a sample of 2401 Danish twin 75 years of age or older. The bootstrap method was used to obtain standard errors for correlation coefficients. It was shown, that the chosen covariates have similar effects on MZ and DZ twins, and that the residual correlation in MZ twins is significantly higher than in DZ twins, which indicates that genetic factors play an etiological role in the determination of physical status of elderly people, controlled for 10 background variables.  相似文献   

10.
Huang Y  Dagne G 《Biometrics》2012,68(3):943-953
Summary It is a common practice to analyze complex longitudinal data using semiparametric nonlinear mixed-effects (SNLME) models with a normal distribution. Normality assumption of model errors may unrealistically obscure important features of subject variations. To partially explain between- and within-subject variations, covariates are usually introduced in such models, but some covariates may often be measured with substantial errors. Moreover, the responses may be missing and the missingness may be nonignorable. Inferential procedures can be complicated dramatically when data with skewness, missing values, and measurement error are observed. In the literature, there has been considerable interest in accommodating either skewness, incompleteness or covariate measurement error in such models, but there has been relatively little study concerning all three features simultaneously. In this article, our objective is to address the simultaneous impact of skewness, missingness, and covariate measurement error by jointly modeling the response and covariate processes based on a flexible Bayesian SNLME model. The method is illustrated using a real AIDS data set to compare potential models with various scenarios and different distribution specifications.  相似文献   

11.
K. R. Koots  J. P. Gibson 《Genetics》1996,143(3):1409-1416
A data set of 1572 heritability estimates and 1015 pairs of genetic and phenotypic correlation estimates, constructed from a survey of published beef cattle genetic parameter estimates, provided a rare opportunity to study realized sampling variances of genetic parameter estimates. The distribution of both heritability estimates and genetic correlation estimates, when plotted against estimated accuracy, was consistent with random error variance being some three times the sampling variance predicted from standard formulae. This result was consistent with the observation that the variance of estimates of heritabilities and genetic correlations between populations were about four times the predicted sampling variance, suggesting few real differences in genetic parameters between populations. Except where there was a strong biological or statistical expectation of a difference, there was little evidence for differences between genetic and phenotypic correlations for most trait combinations or for differences in genetic correlations between populations. These results suggest that, even for controlled populations, estimating genetic parameters specific to a given population is less useful than commonly believed. A serendipitous discovery was that, in the standard formula for theoretical standard error of a genetic correlation estimate, the heritabilities refer to the estimated values and not, as seems generally assumed, the true population values.  相似文献   

12.
Regression modeling of semicompeting risks data   总被引:1,自引:0,他引:1  
Peng L  Fine JP 《Biometrics》2007,63(1):96-108
Semicompeting risks data are often encountered in clinical trials with intermediate endpoints subject to dependent censoring from informative dropout. Unlike with competing risks data, dropout may not be dependently censored by the intermediate event. There has recently been increased attention to these data, in particular inferences about the marginal distribution of the intermediate event without covariates. In this article, we incorporate covariates and formulate their effects on the survival function of the intermediate event via a functional regression model. To accommodate informative censoring, a time-dependent copula model is proposed in the observable region of the data which is more flexible than standard parametric copula models for the dependence between the events. The model permits estimation of the marginal distribution under weaker assumptions than in previous work on competing risks data. New nonparametric estimators for the marginal and dependence models are derived from nonlinear estimating equations and are shown to be uniformly consistent and to converge weakly to Gaussian processes. Graphical model checking techniques are presented for the assumed models. Nonparametric tests are developed accordingly, as are inferences for parametric submodels for the time-varying covariate effects and copula parameters. A novel time-varying sensitivity analysis is developed using the estimation procedures. Simulations and an AIDS data analysis demonstrate the practical utility of the methodology.  相似文献   

13.
We propose a method to estimate the regression coefficients in a competing risks model where the cause-specific hazard for the cause of interest is related to covariates through a proportional hazards relationship and when cause of failure is missing for some individuals. We use multiple imputation procedures to impute missing cause of failure, where the probability that a missing cause is the cause of interest may depend on auxiliary covariates, and combine the maximum partial likelihood estimators computed from several imputed data sets into an estimator that is consistent and asymptotically normal. A consistent estimator for the asymptotic variance is also derived. Simulation results suggest the relevance of the theory in finite samples. Results are also illustrated with data from a breast cancer study.  相似文献   

14.
Multiple imputation (MI) is increasingly popular for handling multivariate missing data. Two general approaches are available in standard computer packages: MI based on the posterior distribution of incomplete variables under a multivariate (joint) model, and fully conditional specification (FCS), which imputes missing values using univariate conditional distributions for each incomplete variable given all the others, cycling iteratively through the univariate imputation models. In the context of longitudinal or clustered data, it is not clear whether these approaches result in consistent estimates of regression coefficient and variance component parameters when the analysis model of interest is a linear mixed effects model (LMM) that includes both random intercepts and slopes with either covariates or both covariates and outcome contain missing information. In the current paper, we compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes. We study the theoretical compatibility between specific imputation models fitted under each of these approaches and the LMM, and also conduct simulation studies in both the longitudinal and clustered data settings. Simulations were motivated by analyses of the association between body mass index (BMI) and quality of life (QoL) in the Longitudinal Study of Australian Children (LSAC). Our findings showed that the relative performance of MI methods vary according to whether the incomplete covariate has fixed or random effects and whether there is missingnesss in the outcome variable. We showed that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation. We illustrate our findings with the analysis of LSAC data.  相似文献   

15.
Wang YG  Lin X 《Biometrics》2005,61(2):413-421
The approach of generalized estimating equations (GEE) is based on the framework of generalized linear models but allows for specification of a working matrix for modeling within-subject correlations. The variance is often assumed to be a known function of the mean. This article investigates the impacts of misspecifying the variance function on estimators of the mean parameters for quantitative responses. Our numerical studies indicate that (1) correct specification of the variance function can improve the estimation efficiency even if the correlation structure is misspecified; (2) misspecification of the variance function impacts much more on estimators for within-cluster covariates than for cluster-level covariates; and (3) if the variance function is misspecified, correct choice of the correlation structure may not necessarily improve estimation efficiency. We illustrate impacts of different variance functions using a real data set from cow growth.  相似文献   

16.
Summary .  In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach.  相似文献   

17.
Wang CY  Huang WT 《Biometrics》2000,56(1):98-105
We consider estimation in logistic regression where some covariate variables may be missing at random. Satten and Kupper (1993, Journal of the American Statistical Association 88, 200-208) proposed estimating odds ratio parameters using methods based on the probability of exposure. By approximating a partial likelihood, we extend their idea and propose a method that estimates the cumulant-generating function of the missing covariate given observed covariates and surrogates in the controls. Our proposed method first estimates some lower order cumulants of the conditional distribution of the unobserved data and then solves a resulting estimating equation for the logistic regression parameter. A simple version of the proposed method is to replace a missing covariate by the summation of its conditional mean and conditional variance given observed data in the controls. We note that one important property of the proposed method is that, when the validation is only on controls, a class of inverse selection probability weighted semiparametric estimators cannot be applied because selection probabilities on cases are zeroes. The proposed estimator performs well unless the relative risk parameters are large, even though it is technically inconsistent. Small-sample simulations are conducted. We illustrate the method by an example of real data analysis.  相似文献   

18.
Random regression models are widely used in the field of animal breeding for the genetic evaluation of daily milk yields from different test days. These models are capable of handling different environmental effects on the respective test day, and they describe the characteristics of the course of the lactation period by using suitable covariates with fixed and random regression coefficients. As the numerically expensive estimation of parameters is already part of advanced computer software, modifications of random regression models will considerably grow in importance for statistical evaluations of nutrition and behaviour experiments with animals. Random regression models belong to the large class of linear mixed models. Thus, when choosing a model, or more precisely, when selecting a suitable covariance structure of the random effects, the information criteria of Akaike and Schwarz can be used. In this study, the fitting of random regression models for a statistical analysis of a feeding experiment with dairy cows is illustrated under application of the program package SAS. For each of the feeding groups, lactation curves modelled by covariates with fixed regression coefficients are estimated simultaneously. With the help of the fixed regression coefficients, differences between the groups are estimated and then tested for significance. The covariance structure of the random and subject-specific effects and the serial correlation matrix are selected by using information criteria and by estimating correlations between repeated measurements. For the verification of the selected model and the alternative models, mean values and standard deviations estimated with ordinary least square residuals are used.  相似文献   

19.
Guolo A 《Biometrics》2008,64(4):1207-1214
SUMMARY: We investigate the use of prospective likelihood methods to analyze retrospective case-control data where some of the covariates are measured with error. We show that prospective methods can be applied and the case-control sampling scheme can be ignored if one adequately models the distribution of the error-prone covariates in the case-control sampling scheme. Indeed, subject to this, the prospective likelihood methods result in consistent estimates and information standard errors are asymptotically correct. However, the distribution of such covariates is not the same in the population and under case-control sampling, dictating the need to model the distribution flexibly. In this article, we illustrate the general principle by modeling the distribution of the continuous error-prone covariates using the skewnormal distribution. The performance of the method is evaluated through simulation studies, which show satisfactory results in terms of bias and coverage. Finally, the method is applied to the analysis of two data sets which refer, respectively, to a cholesterol study and a study on breast cancer.  相似文献   

20.
A simple method for the analysis of clustered binary data.   总被引:15,自引:0,他引:15  
J N Rao  A J Scott 《Biometrics》1992,48(2):577-585
A simple method for comparing independent groups of clustered binary data with group-specific covariates is proposed. It is based on the concepts of design effect and effective sample size widely used in sample surveys, and assumes no specific models for the intracluster correlations. It can be implemented using any standard computer program for the analysis of independent binary data after a small amount of preprocessing. The method is applied to a variety of problems involving clustered binary data: testing homogeneity of proportions, estimating dose-response models and testing for trend in proportions, and performing the Mantel-Haenszel chi-squared test for independence in a series of 2 x 2 tables and estimating the common odds ratio and its variance. Illustrative applications of the method are also presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号