共查询到20条相似文献,搜索用时 0 毫秒
1.
We consider methods for analyzing categorical regression models when some covariates (Z) are completely observed but other covariates (X) are missing for some subjects. When data on X are missing at random (i.e., when the probability that X is observed does not depend on the value of X itself), we present a likelihood approach for the observed data that allows the same nuisance parameters to be eliminated in a conditional analysis as when data are complete. An example of a matched case-control study is used to demonstrate our approach. 相似文献
2.
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations. 相似文献
3.
4.
5.
6.
Rathouz PJ 《Biostatistics (Oxford, England)》2007,8(2):345-356
Methods in the literature for missing covariate data in survival models have relied on the missing at random (MAR) assumption to render regression parameters identifiable. MAR means that missingness can depend on the observed exit time, and whether or not that exit is a failure or a censoring event. By considering ways in which missingness of covariate X could depend on the true but possibly censored failure time T and the true censoring time C, we attempt to identify missingness mechanisms which would yield MAR data. We find that, under various reasonable assumptions about how missingness might depend on T and/or C, additional strong assumptions are needed to obtain MAR. We conclude that MAR is difficult to justify in practical applications. One exception arises when missingness is independent of T, and C is independent of the value of the missing X. As alternatives to MAR, we propose two new missingness assumptions. In one, the missingness depends on T but not on C; in the other, the situation is reversed. For each, we show that the failure time model is identifiable. When missingness is independent of T, we show that the naive complete record analysis will yield a consistent estimator of the failure time distribution. When missingness is independent of C, we develop a complete record likelihood function and a corresponding estimator for parametric failure time models. We propose analyses to evaluate the plausibility of either assumption in a particular data set, and illustrate the ideas using data from the literature on this problem. 相似文献
7.
The goal of this article is to construct doubly robust (DR) estimators in ignorable missing data and causal inference models. In a missing data model, an estimator is DR if it remains consistent when either (but not necessarily both) a model for the missingness mechanism or a model for the distribution of the complete data is correctly specified. Because with observational data one can never be sure that either a missingness model or a complete data model is correct, perhaps the best that can be hoped for is to find a DR estimator. DR estimators, in contrast to standard likelihood-based or (nonaugmented) inverse probability-weighted estimators, give the analyst two chances, instead of only one, to make a valid inference. In a causal inference model, an estimator is DR if it remains consistent when either a model for the treatment assignment mechanism or a model for the distribution of the counterfactual data is correctly specified. Because with observational data one can never be sure that a model for the treatment assignment mechanism or a model for the counterfactual data is correct, inference based on DR estimators should improve upon previous approaches. Indeed, we present the results of simulation studies which demonstrate that the finite sample performance of DR estimators is as impressive as theory would predict. The proposed method is applied to a cardiovascular clinical trial. 相似文献
8.
Summary . Missing data, measurement error, and misclassification are three important problems in many research fields, such as epidemiological studies. It is well known that missing data and measurement error in covariates may lead to biased estimation. Misclassification may be considered as a special type of measurement error, for categorical data. Nevertheless, we treat misclassification as a different problem from measurement error because statistical models for them are different. Indeed, in the literature, methods for these three problems were generally proposed separately given that statistical modeling for them are very different. The problem is more challenging in a longitudinal study with nonignorable missing data. In this article, we consider estimation in generalized linear models under these three incomplete data models. We propose a general approach based on expected estimating equations (EEEs) to solve these three incomplete data problems in a unified fashion. This EEE approach can be easily implemented and its asymptotic covariance can be obtained by sandwich estimation. Intensive simulation studies are performed under various incomplete data settings. The proposed method is applied to a longitudinal study of oral bone density in relation to body bone density. 相似文献
9.
We consider estimation in logistic regression where some covariate variables may be missing at random. Satten and Kupper (1993, Journal of the American Statistical Association 88, 200-208) proposed estimating odds ratio parameters using methods based on the probability of exposure. By approximating a partial likelihood, we extend their idea and propose a method that estimates the cumulant-generating function of the missing covariate given observed covariates and surrogates in the controls. Our proposed method first estimates some lower order cumulants of the conditional distribution of the unobserved data and then solves a resulting estimating equation for the logistic regression parameter. A simple version of the proposed method is to replace a missing covariate by the summation of its conditional mean and conditional variance given observed data in the controls. We note that one important property of the proposed method is that, when the validation is only on controls, a class of inverse selection probability weighted semiparametric estimators cannot be applied because selection probabilities on cases are zeroes. The proposed estimator performs well unless the relative risk parameters are large, even though it is technically inconsistent. Small-sample simulations are conducted. We illustrate the method by an example of real data analysis. 相似文献
10.
Analyzing incomplete longitudinal clinical trial data 总被引:1,自引:0,他引:1
Molenberghs G Thijs H Jansen I Beunckens C Kenward MG Mallinckrodt C Carroll RJ 《Biostatistics (Oxford, England)》2004,5(3):445-464
Using standard missing data taxonomy, due to Rubin and co-workers, and simple algebraic derivations, it is argued that some simple but commonly used methods to handle incomplete longitudinal clinical trial data, such as complete case analyses and methods based on last observation carried forward, require restrictive assumptions and stand on a weaker theoretical foundation than likelihood-based methods developed under the missing at random (MAR) framework. Given the availability of flexible software for analyzing longitudinal sequences of unequal length, implementation of likelihood-based MAR analyses is not limited by computational considerations. While such analyses are valid under the comparatively weak assumption of MAR, the possibility of data missing not at random (MNAR) is difficult to rule out. It is argued, however, that MNAR analyses are, themselves, surrounded with problems and therefore, rather than ignoring MNAR analyses altogether or blindly shifting to them, their optimal place is within sensitivity analysis. The concepts developed here are illustrated using data from three clinical trials, where it is shown that the analysis method may have an impact on the conclusions of the study. 相似文献
11.
12.
13.
Inference about means from incomplete multivariate data 总被引:1,自引:0,他引:1
14.
Summary We consider the estimation problem of a proportional odds model with missing covariates. Based on the validation and nonvalidation data sets, we propose a joint conditional method that is an extension of Wang et al. (2002, Statistica Sinica 12, 555–574). The proposed method is semiparametric since it requires neither an additional model for the missingness mechanism, nor the specification of the conditional distribution of missing covariates given observed variables. Under the assumption that the observed covariates and the surrogate variable are categorical, we derived the large sample property. The simulation studies show that in various situations, the joint conditional method is more efficient than the conditional estimation method and weighted method. We also use a real data set that came from a survey of cable TV satisfaction to illustrate the approaches. 相似文献
15.
Noncompliance is a common problem in experiments involving randomized assignment of treatments, and standard analyses based on intention-to-treat or treatment received have limitations. An attractive alternative is to estimate the Complier-Average Causal Effect (CACE), which is the average treatment effect for the subpopulation of subjects who would comply under either treatment (Angrist, Imbens, and Rubin, 1996, Journal of American Statistical Association 91, 444-472). We propose an extended general location model to estimate the CACE from data with noncompliance and missing data in the outcome and in baseline covariates. Models for both continuous and categorical outcomes and ignorable and latent ignorable (Frangakis and Rubin, 1999, Biometrika 86, 365-379) missing-data mechanisms are developed. Inferences for the models are based on the EM algorithm and Bayesian MCMC methods. We present results from simulations that investigate sensitivity to model assumptions and the influence of missing-data mechanism. We also apply the method to the data from a job search intervention for unemployed workers. 相似文献
16.
17.
Summary . In this article, we first study parameter identifiability in randomized clinical trials with noncompliance and missing outcomes. We show that under certain conditions the parameters of interest are identifiable even under different types of completely nonignorable missing data: that is, the missing mechanism depends on the outcome. We then derive their maximum likelihood and moment estimators and evaluate their finite-sample properties in simulation studies in terms of bias, efficiency, and robustness. Our sensitivity analysis shows that the assumed nonignorable missing-data model has an important impact on the estimated complier average causal effect (CACE) parameter. Our new method provides some new and useful alternative nonignorable missing-data models over the existing latent ignorable model, which guarantees parameter identifiability, for estimating the CACE in a randomized clinical trial with noncompliance and missing data. 相似文献
18.
Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the Arctic 总被引:2,自引:0,他引:2
Many chemical and environmental data sets are complicated by the existence of fully missing values or censored values known to lie below detection thresholds. For example, week-long samples of airborne particulate matter were obtained at Alert, NWT, Canada, between 1980 and 1991, where some of the concentrations of 24 particulate constituents were coarsened in the sense of being either fully missing or below detection limits. To facilitate scientific analysis, it is appealing to create complete data by filling in missing values so that standard complete-data methods can be applied. We briefly review commonly used strategies for handling missing values and focus on the multiple-imputation approach, which generally leads to valid inferences when faced with missing data. Three statistical models are developed for multiply imputing the missing values of airborne particulate matter. We expect that these models are useful for creating multiple imputations in a variety of incomplete multivariate time series data sets. 相似文献
19.