首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider longitudinal studies in which the outcome observed over time is binary and the covariates of interest are categorical. With no missing responses or covariates, one specifies a multinomial model for the responses given the covariates and uses maximum likelihood to estimate the parameters. Unfortunately, incomplete data in the responses and covariates are a common occurrence in longitudinal studies. Here we assume the missing data are missing at random (Rubin, 1976, Biometrika 63, 581-592). Since all of the missing data (responses and covariates) are categorical, a useful technique for obtaining maximum likelihood parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). In using the EM algorithm with missing responses and covariates, one specifies the joint distribution of the responses and covariates. Here we consider the parameters of the covariate distribution as a nuisance. In data sets where the percentage of missing data is high, the estimates of the nuisance parameters can lead to highly unstable estimates of the parameters of interest. We propose a conditional model for the covariate distribution that has several modeling advantages for the EM algorithm and provides a reduction in the number of nuisance parameters, thus providing more stable estimates in finite samples.  相似文献   

2.
Cho Paik M 《Biometrics》2004,60(2):306-314
Matched case-control data analysis is often challenged by a missing covariate problem, the mishandling of which could cause bias or inefficiency. Satten and Carroll (2000, Biometrics56, 384-388) and other authors have proposed methods to handle missing covariates when the probability of missingness depends on the observed data, i.e., when data are missing at random. In this article, we propose a conditional likelihood method to handle the case when the probability of missingness depends on the unobserved covariate, i.e., when data are nonignorably missing. When the missing covariate is binary, the proposed method can be implemented using standard software. Using the Northern Manhattan Stroke Study data, we illustrate the method and discuss how sensitivity analysis can be conducted.  相似文献   

3.
Chen B  Zhou XH 《Biometrics》2011,67(3):830-842
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.  相似文献   

4.
Dewanji A  Sengupta D 《Biometrics》2003,59(4):1063-1070
In competing risks data, missing failure types (causes) is a very common phenomenon. In this work, we consider a general missing pattern in which, if a failure type is not observed, one observes a set of possible types containing the true type, along with the failure time. We first consider maximum likelihood estimation with missing-at-random assumption via the expectation maximization (EM) algorithm. We then propose a Nelson-Aalen type estimator for situations when certain information on the conditional probability of the true type given a set of possible failure types is available from the experimentalists. This is based on a least-squares type method using the relationships between hazards for different types and hazards for different combinations of missing types. We conduct a simulation study to investigate the performance of this method, which indicates that bias may be small, even for high proportion of missing data, for sufficiently large number of observations. The estimates are somewhat sensitive to misspecification of the conditional probabilities of the true types when the missing proportion is high. We also consider an example from an animal experiment to illustrate our methodology.  相似文献   

5.
Wang CY  Huang WT 《Biometrics》2000,56(1):98-105
We consider estimation in logistic regression where some covariate variables may be missing at random. Satten and Kupper (1993, Journal of the American Statistical Association 88, 200-208) proposed estimating odds ratio parameters using methods based on the probability of exposure. By approximating a partial likelihood, we extend their idea and propose a method that estimates the cumulant-generating function of the missing covariate given observed covariates and surrogates in the controls. Our proposed method first estimates some lower order cumulants of the conditional distribution of the unobserved data and then solves a resulting estimating equation for the logistic regression parameter. A simple version of the proposed method is to replace a missing covariate by the summation of its conditional mean and conditional variance given observed data in the controls. We note that one important property of the proposed method is that, when the validation is only on controls, a class of inverse selection probability weighted semiparametric estimators cannot be applied because selection probabilities on cases are zeroes. The proposed estimator performs well unless the relative risk parameters are large, even though it is technically inconsistent. Small-sample simulations are conducted. We illustrate the method by an example of real data analysis.  相似文献   

6.
Summary .  Longitudinal studies often generate incomplete response patterns according to a missing not at random mechanism. Shared parameter models provide an appealing framework for the joint modelling of the measurement and missingness processes, especially in the nonmonotone missingness case, and assume a set of random effects to induce the interdependence. Parametric assumptions are typically made for the random effects distribution, violation of which leads to model misspecification with a potential effect on the parameter estimates and standard errors. In this article we avoid any parametric assumption for the random effects distribution and leave it completely unspecified. The estimation of the model is then made using a semi-parametric maximum likelihood method. Our proposal is illustrated on a randomized longitudinal study on patients with rheumatoid arthritis exhibiting nonmonotone missingness.  相似文献   

7.
In large cohort studies, it is common that a subset of the regressors may be missing for some study subjects by design or happenstance. In this article, we apply the multiple data augmentation techniques to semiparametric models for epidemiologic data when a subset of the regressors are missing for some subjects, under the assumption that the data are missing at random in the sense of Rubin (2004) and that the missingness probabilities depend jointly on the observable subset of regressors, on a set of observable extraneous variables and on the outcome. Computational algorithms for the Poor Man's and the Asymptotic Normal data augmentations are investigated. Simulation studies show that the data augmentation approach generates satisfactory estimates and is computationally affordable. Under certain simulation scenarios, the proposed approach can achieve asymptotic efficiency similar to the maximum likelihood approach. We apply the proposed technique to the Multi-Ethic Study of Atherosclerosis (MESA) data and the South Wales Nickel Worker Study data.  相似文献   

8.
Yan W  Hu Y  Geng Z 《Biometrics》2012,68(1):121-128
We discuss identifiability and estimation of causal effects of a treatment in subgroups defined by a covariate that is sometimes missing due to death, which is different from a problem with outcomes censored by death. Frangakis et al. (2007, Biometrics 63, 641-662) proposed an approach for estimating the causal effects under a strong monotonicity (SM) assumption. In this article, we focus on identifiability of the joint distribution of the covariate, treatment and potential outcomes, show sufficient conditions for identifiability, and relax the SM assumption to monotonicity (M) and no-interaction (NI) assumptions. We derive expectation-maximization algorithms for finding the maximum likelihood estimates of parameters of the joint distribution under different assumptions. Further we remove the M and NI assumptions, and prove that signs of the causal effects of a treatment in the subgroups are identifiable, which means that their bounds do not cover zero. We perform simulations and a sensitivity analysis to evaluate our approaches. Finally, we apply the approaches to the National Study on the Costs and Outcomes of Trauma Centers data, which are also analyzed by Frangakis et al. (2007) and Xie and Murphy (2007, Biometrics 63, 655-658).  相似文献   

9.
GEE with Gaussian estimation of the correlations when data are incomplete   总被引:4,自引:0,他引:4  
This paper considers a modification of generalized estimating equations (GEE) for handling missing binary response data. The proposed method uses Gaussian estimation of the correlation parameters, i.e., the estimating function that yields an estimate of the correlation parameters is obtained from the multivariate normal likelihood. The proposed method yields consistent estimates of the regression parameters when data are missing completely at random (MCAR). However, when data are missing at random (MAR), consistency may not hold. In a simulation study with repeated binary outcomes that are missing at random, the magnitude of the potential bias that can arise is examined. The results of the simulation study indicate that, when the working correlation matrix is correctly specified, the bias is almost negligible for the modified GEE. In the simulation study, the proposed modification of GEE is also compared to the standard GEE, multiple imputation, and weighted estimating equations approaches. Finally, the proposed method is illustrated using data from a longitudinal clinical trial comparing two therapeutic treatments, zidovudine (AZT) and didanosine (ddI), in patients with HIV.  相似文献   

10.
11.
Incomplete covariate data are a common occurrence in studies in which the outcome is survival time. Further, studies in the health sciences often give rise to correlated, possibly censored, survival data. With no missing covariate data, if the marginal distributions of the correlated survival times follow a given parametric model, then the estimates using the maximum likelihood estimating equations, naively treating the correlated survival times as independent, give consistent estimates of the relative risk parameters Lipsitz et al. 1994 50, 842-846. Now, suppose that some observations within a cluster have some missing covariates. We show in this paper that if one naively treats observations within a cluster as independent, that one can still use the maximum likelihood estimating equations to obtain consistent estimates of the relative risk parameters. This method requires the estimation of the parameters of the distribution of the covariates. We present results from a clinical trial Lipsitz and Ibrahim (1996b) 2, 5-14 with five covariates, four of which have some missing values. In the trial, the clusters are the hospitals in which the patients were treated.  相似文献   

12.
Methods in the literature for missing covariate data in survival models have relied on the missing at random (MAR) assumption to render regression parameters identifiable. MAR means that missingness can depend on the observed exit time, and whether or not that exit is a failure or a censoring event. By considering ways in which missingness of covariate X could depend on the true but possibly censored failure time T and the true censoring time C, we attempt to identify missingness mechanisms which would yield MAR data. We find that, under various reasonable assumptions about how missingness might depend on T and/or C, additional strong assumptions are needed to obtain MAR. We conclude that MAR is difficult to justify in practical applications. One exception arises when missingness is independent of T, and C is independent of the value of the missing X. As alternatives to MAR, we propose two new missingness assumptions. In one, the missingness depends on T but not on C; in the other, the situation is reversed. For each, we show that the failure time model is identifiable. When missingness is independent of T, we show that the naive complete record analysis will yield a consistent estimator of the failure time distribution. When missingness is independent of C, we develop a complete record likelihood function and a corresponding estimator for parametric failure time models. We propose analyses to evaluate the plausibility of either assumption in a particular data set, and illustrate the ideas using data from the literature on this problem.  相似文献   

13.
We examine the hypothesis of an increase of humus disintegration in central European forests by modeling data from the case study “Zierenberg”, which is carried out in Germany. We analyse a spatio‐temporal regression model, which was constructed after an exploratory data analysis. The data are unbalanced repeated measurements collected at several sites and soil depths between 1989 and 1995 with a huge amount of missing observations. Spatial dependencies are considered by coupling linear models for the distinct depths using autoregressive terms and adding random effects. Under the assumption of normality, we provide formulae for maximum likelihood estimation as well as test statistics. As a result of the data analysis, we find that some chemical substances might be influential in the process of humus disintegration.  相似文献   

14.
Horton NJ  Laird NM 《Biometrics》2001,57(1):34-42
This article presents a new method for maximum likelihood estimation of logistic regression models with incomplete covariate data where auxiliary information is available. This auxiliary information is extraneous to the regression model of interest but predictive of the covariate with missing data. Ibrahim (1990, Journal of the American Statistical Association 85, 765-769) provides a general method for estimating generalized linear regression models with missing covariates using the EM algorithm that is easily implemented when there is no auxiliary data. Vach (1997, Statistics in Medicine 16, 57-72) describes how the method can be extended when the outcome and auxiliary data are conditionally independent given the covariates in the model. The method allows the incorporation of auxiliary data without making the conditional independence assumption. We suggest tests of conditional independence and compare the performance of several estimators in an example concerning mental health service utilization in children. Using an artificial dataset, we compare the performance of several estimators when auxiliary data are available.  相似文献   

15.
In this paper, repeated measures with intraclass correlation model is considered when the observations are missing at random. An exact test for the equality of the mean components and simultaneous confidence intervals (Scheffé and Bonferroni inequality types) are given for linear contrasts of the mean components when the missing observations are of a monotone type. When the missing observations are not of the monotone type, the maximum likelihood estimates are obtained numerically by iterative methods given in Srivastava and Carter (1986). These estimators are then used to obtain asymptotic tests and confidence intervals for the equality of mean components and linear contrasts, respectively. An example is given to illustrate the method.  相似文献   

16.
Using validation sets for outcomes can greatly improve the estimation of vaccine efficacy (VE) in the field (Halloran and Longini, 2001; Halloran and others, 2003). Most statistical methods for using validation sets rely on the assumption that outcomes on those with no cultures are missing at random (MAR). However, often the validation sets will not be chosen at random. For example, confirmational cultures are often done on people with influenza-like illness as part of routine influenza surveillance. VE estimates based on such non-MAR validation sets could be biased. Here we propose frequentist and Bayesian approaches for estimating VE in the presence of validation bias. Our work builds on the ideas of Rotnitzky and others (1998, 2001), Scharfstein and others (1999, 2003), and Robins and others (2000). Our methods require expert opinion about the nature of the validation selection bias. In a re-analysis of an influenza vaccine study, we found, using the beliefs of a flu expert, that within any plausible range of selection bias the VE estimate based on the validation sets is much higher than the point estimate using just the non-specific case definition. Our approach is generally applicable to studies with missing binary outcomes with categorical covariates.  相似文献   

17.
Logically defined outcomes are commonly used in medical diagnoses and epidemiological research. When missing values in the original outcomes exist, the method of handling the missingness can have unintended consequences, even if the original outcomes are missing completely at random. In this note, we consider 2 binary original outcomes, which are missing completely at random. For estimating the prevalence of a logically defined "or" outcome, we discuss the properties of 4 estimators: the complete-case estimator, the available-case estimator, the maximum likelihood estimator (MLE), and a moment-based estimator. With the exception of the available-case case estimator, all the estimators are consistent. The MLE exhibits superior performance and should be generally adopted.  相似文献   

18.
Summary In individually matched case–control studies, when some covariates are incomplete, an analysis based on the complete data may result in a large loss of information both in the missing and completely observed variables. This usually results in a bias and loss of efficiency. In this article, we propose a new method for handling the problem of missing covariate data based on a missing‐data‐induced intensity approach when the missingness mechanism does not depend on case–control status and show that this leads to a generalization of the missing indicator method. We derive the asymptotic properties of the estimates from the proposed method and, using an extensive simulation study, assess the finite sample performance in terms of bias, efficiency, and 95% confidence coverage under several missing data scenarios. We also make comparisons with complete‐case analysis (CCA) and some missing data methods that have been proposed previously. Our results indicate that, under the assumption of predictable missingness, the suggested method provides valid estimation of parameters, is more efficient than CCA, and is competitive with other, more complex methods of analysis. A case–control study of multiple myeloma risk and a polymorphism in the receptor Inter‐Leukin‐6 (IL‐6‐α) is used to illustrate our findings.  相似文献   

19.
Summary .  Little and An (2004,  Statistica Sinica   14, 949–968) proposed a penalized spline of propensity prediction (PSPP) method of imputation of missing values that yields robust model-based inference under the missing at random assumption. The propensity score for a missing variable is estimated and a regression model is fitted that includes the spline of the estimated logit propensity score as a covariate. The predicted unconditional mean of the missing variable has a double robustness (DR) property under misspecification of the imputation model. We show that a simplified version of PSPP, which does not center other regressors prior to including them in the prediction model, also has the DR property. We also propose two extensions of PSPP, namely, stratified PSPP and bivariate PSPP, that extend the DR property to inferences about conditional means. These extended PSPP methods are compared with the PSPP method and simple alternatives in a simulation study and applied to an online weight loss study conducted by Kaiser Permanente.  相似文献   

20.
Chen Q  Ibrahim JG 《Biometrics》2006,62(1):177-184
We consider a class of semiparametric models for the covariate distribution and missing data mechanism for missing covariate and/or response data for general classes of regression models including generalized linear models and generalized linear mixed models. Ignorable and nonignorable missing covariate and/or response data are considered. The proposed semiparametric model can be viewed as a sensitivity analysis for model misspecification of the missing covariate distribution and/or missing data mechanism. The semiparametric model consists of a generalized additive model (GAM) for the covariate distribution and/or missing data mechanism. Penalized regression splines are used to express the GAMs as a generalized linear mixed effects model, in which the variance of the corresponding random effects provides an intuitive index for choosing between the semiparametric and parametric model. Maximum likelihood estimates are then obtained via the EM algorithm. Simulations are given to demonstrate the methodology, and a real data set from a melanoma cancer clinical trial is analyzed using the proposed methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号