首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Data with missing covariate values but fully observed binary outcomes are an important subset of the missing data challenge. Common approaches are complete case analysis (CCA) and multiple imputation (MI). While CCA relies on missing completely at random (MCAR), MI usually relies on a missing at random (MAR) assumption to produce unbiased results. For MI involving logistic regression models, it is also important to consider several missing not at random (MNAR) conditions under which CCA is asymptotically unbiased and, as we show, MI is also valid in some cases. We use a data application and simulation study to compare the performance of several machine learning and parametric MI methods under a fully conditional specification framework (MI-FCS). Our simulation includes five scenarios involving MCAR, MAR, and MNAR under predictable and nonpredictable conditions, where “predictable” indicates missingness is not associated with the outcome. We build on previous results in the literature to show MI and CCA can both produce unbiased results under more conditions than some analysts may realize. When both approaches were valid, we found that MI-FCS was at least as good as CCA in terms of estimated bias and coverage, and was superior when missingness involved a categorical covariate. We also demonstrate how MNAR sensitivity analysis can build confidence that unbiased results were obtained, including under MNAR-predictable, when CCA and MI are both valid. Since the missingness mechanism cannot be identified from observed data, investigators should compare results from MI and CCA when both are plausibly valid, followed by MNAR sensitivity analysis.  相似文献   

2.
Summary In medical research, the receiver operating characteristic (ROC) curves can be used to evaluate the performance of biomarkers for diagnosing diseases or predicting the risk of developing a disease in the future. The area under the ROC curve (ROC AUC), as a summary measure of ROC curves, is widely utilized, especially when comparing multiple ROC curves. In observational studies, the estimation of the AUC is often complicated by the presence of missing biomarker values, which means that the existing estimators of the AUC are potentially biased. In this article, we develop robust statistical methods for estimating the ROC AUC and the proposed methods use information from auxiliary variables that are potentially predictive of the missingness of the biomarkers or the missing biomarker values. We are particularly interested in auxiliary variables that are predictive of the missing biomarker values. In the case of missing at random (MAR), that is, missingness of biomarker values only depends on the observed data, our estimators have the attractive feature of being consistent if one correctly specifies, conditional on auxiliary variables and disease status, either the model for the probabilities of being missing or the model for the biomarker values. In the case of missing not at random (MNAR), that is, missingness may depend on the unobserved biomarker values, we propose a sensitivity analysis to assess the impact of MNAR on the estimation of the ROC AUC. The asymptotic properties of the proposed estimators are studied and their finite‐sample behaviors are evaluated in simulation studies. The methods are further illustrated using data from a study of maternal depression during pregnancy.  相似文献   

3.
Methods in the literature for missing covariate data in survival models have relied on the missing at random (MAR) assumption to render regression parameters identifiable. MAR means that missingness can depend on the observed exit time, and whether or not that exit is a failure or a censoring event. By considering ways in which missingness of covariate X could depend on the true but possibly censored failure time T and the true censoring time C, we attempt to identify missingness mechanisms which would yield MAR data. We find that, under various reasonable assumptions about how missingness might depend on T and/or C, additional strong assumptions are needed to obtain MAR. We conclude that MAR is difficult to justify in practical applications. One exception arises when missingness is independent of T, and C is independent of the value of the missing X. As alternatives to MAR, we propose two new missingness assumptions. In one, the missingness depends on T but not on C; in the other, the situation is reversed. For each, we show that the failure time model is identifiable. When missingness is independent of T, we show that the naive complete record analysis will yield a consistent estimator of the failure time distribution. When missingness is independent of C, we develop a complete record likelihood function and a corresponding estimator for parametric failure time models. We propose analyses to evaluate the plausibility of either assumption in a particular data set, and illustrate the ideas using data from the literature on this problem.  相似文献   

4.
Analyzing incomplete longitudinal clinical trial data   总被引:1,自引:0,他引:1  
Using standard missing data taxonomy, due to Rubin and co-workers, and simple algebraic derivations, it is argued that some simple but commonly used methods to handle incomplete longitudinal clinical trial data, such as complete case analyses and methods based on last observation carried forward, require restrictive assumptions and stand on a weaker theoretical foundation than likelihood-based methods developed under the missing at random (MAR) framework. Given the availability of flexible software for analyzing longitudinal sequences of unequal length, implementation of likelihood-based MAR analyses is not limited by computational considerations. While such analyses are valid under the comparatively weak assumption of MAR, the possibility of data missing not at random (MNAR) is difficult to rule out. It is argued, however, that MNAR analyses are, themselves, surrounded with problems and therefore, rather than ignoring MNAR analyses altogether or blindly shifting to them, their optimal place is within sensitivity analysis. The concepts developed here are illustrated using data from three clinical trials, where it is shown that the analysis method may have an impact on the conclusions of the study.  相似文献   

5.
Generalized additive models (GAMs) have been widely used for flexible modeling of various types of outcomes. When the outcome in a GAM is subject to missing, practical analyses often assume that missingness is missing at random (MAR). This assumption can be of suspicion when the missingness is not by design. Evaluating the potential effects of alternative nonignorable missing data mechanism on the MAR inference from a GAM can be important but often challenging due to the complicatedness of alternative nonignorable models. We apply the index approach to local sensitivity (Troxel, Ma, and Heitjan 2004 (2004). Statistica Sinica 14 , 1221–1237) to evaluate the potential changes of the GAM estimates in the neighborhood of the MAR model. The approach avoids fitting any complicated nonignorable GAM. Only MAR estimates are required to calculate the resulting sensitivity index and adjust the GAM estimates to account for nonignorable missingness. Thus the proposed approach is considerably simpler to conduct, as compared with the alternative methods. The simulation study shows that the index provides valid assessment of the local sensitivity of the GAM estimates to nonignorable missingness. We then illustrate the method using a rheumatoid arthritis clinical trial data set.  相似文献   

6.
Analysis with time-to-event data in clinical and epidemiological studies often encounters missing covariate values, and the missing at random assumption is commonly adopted, which assumes that missingness depends on the observed data, including the observed outcome which is the minimum of survival and censoring time. However, it is conceivable that in certain settings, missingness of covariate values is related to the survival time but not to the censoring time. This is especially so when covariate missingness is related to an unmeasured variable affected by the patient's illness and prognosis factors at baseline. If this is the case, then the covariate missingness is not at random as the survival time is censored, and it creates a challenge in data analysis. In this article, we propose an approach to deal with such survival-time-dependent covariate missingness based on the well known Cox proportional hazard model. Our method is based on inverse propensity weighting with the propensity estimated by nonparametric kernel regression. Our estimators are consistent and asymptotically normal, and their finite-sample performance is examined through simulation. An application to a real-data example is included for illustration.  相似文献   

7.
This work develops a joint model selection criterion for simultaneously selecting the marginal mean regression and the correlation/covariance structure in longitudinal data analysis where both the outcome and the covariate variables may be subject to general intermittent patterns of missingness under the missing at random mechanism. The new proposal, termed “joint longitudinal information criterion” (JLIC), is based on the expected quadratic error for assessing model adequacy, and the second‐order weighted generalized estimating equation (WGEE) estimation for mean and covariance models. Simulation results reveal that JLIC outperforms existing methods performing model selection for the mean regression and the correlation structure in a two stage and hence separate manner. We apply the proposal to a longitudinal study to identify factors associated with life satisfaction in the elderly of Taiwan.  相似文献   

8.
For regression with covariates missing not at random where the missingness depends on the missing covariate values, complete-case (CC) analysis leads to consistent estimation when the missingness is independent of the response given all covariates, but it may not have the desired level of efficiency. We propose a general empirical likelihood framework to improve estimation efficiency over the CC analysis. We expand on methods in Bartlett et al. (2014, Biostatistics 15 , 719–730) and Xie and Zhang (2017, Int J Biostat 13 , 1–20) that improve efficiency by modeling the missingness probability conditional on the response and fully observed covariates by allowing the possibility of modeling other data distribution-related quantities. We also give guidelines on what quantities to model and demonstrate that our proposal has the potential to yield smaller biases than existing methods when the missingness probability model is incorrect. Simulation studies are presented, as well as an application to data collected from the US National Health and Nutrition Examination Survey.  相似文献   

9.
Chen B  Zhou XH 《Biometrics》2011,67(3):830-842
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.  相似文献   

10.
Missing data is a common issue in research using observational studies to investigate the effect of treatments on health outcomes. When missingness occurs only in the covariates, a simple approach is to use missing indicators to handle the partially observed covariates. The missing indicator approach has been criticized for giving biased results in outcome regression. However, recent papers have suggested that the missing indicator approach can provide unbiased results in propensity score analysis under certain assumptions. We consider assumptions under which the missing indicator approach can provide valid inferences, namely, (1) no unmeasured confounding within missingness patterns; either (2a) covariate values of patients with missing data were conditionally independent of treatment or (2b) these values were conditionally independent of outcome; and (3) the outcome model is correctly specified: specifically, the true outcome model does not include interactions between missing indicators and fully observed covariates. We prove that, under the assumptions above, the missing indicator approach with outcome regression can provide unbiased estimates of the average treatment effect. We use a simulation study to investigate the extent of bias in estimates of the treatment effect when the assumptions are violated and we illustrate our findings using data from electronic health records. In conclusion, the missing indicator approach can provide valid inferences for outcome regression, but the plausibility of its assumptions must first be considered carefully.  相似文献   

11.
Recently, instrumental variables methods have been used to address non-compliance in randomized experiments. Complicating such analyses is often the presence of missing data. The standard model for missing data, missing at random (MAR), has some unattractive features in this context. In this paper we compare MAR-based estimates of the complier average causal effect (CACE) with an estimator based on an alternative, nonignorable model for the missing data process, developed by Frangakis and Rubin (1999, Biometrika, 86, 365-379). We also introduce a new missing data model that, like the Frangakis-Rubin model, is specially suited for models with instrumental variables, but makes different substantive assumptions. We analyze these issues in the context of a randomized trial of breast self-examination (BSE). In the study two methods of teaching BSE, consisting of either mailed information about BSE (the standard treatment) or the attendance of a course involving theoretical and practical sessions (the new treatment), were compared with the aim of assessing whether teaching programs could increase BSE practice and improve examination skills. The study was affected by the two sources of bias mentioned above: only 55% of women assigned to receive the new treatment complied with their assignment and 35% of the women did not respond to the post-test questionnaire. Comparing the causal estimand of the new treatment using the MAR, Frangakis-Rubin, and our new approach, the results suggest that for these data the MAR assumption appears least plausible, and that the new model appears most plausible among the three choices.  相似文献   

12.
Informative missingness of parental genotype data occurs when the genotype of a parent influences the probability of the parent's genotype data being observed. Informative missingness can occur in a number of plausible ways and can affect both the validity and power of procedures that assume the data are missing at random (MAR). We propose a bootstrap calibration of MAR procedures to account for informative missingness and apply our methodology to refine the approach implemented in the TRANSMIT program. We illustrate this approach by applying it to data on hypertensive probands and their parents who participated in the Framingham Heart Study.  相似文献   

13.
Outcome misclassification occurs frequently in binary-outcome studies and can result in biased estimation of quantities such as the incidence, prevalence, cause-specific hazards, cumulative incidence functions, and so forth. A number of remedies have been proposed to address the potential misclassification of the outcomes in such data. The majority of these remedies lie in the estimation of misclassification probabilities, which are in turn used to adjust analyses for outcome misclassification. A number of authors advocate using a gold-standard procedure on a sample internal to the study to learn about the extent of the misclassification. With this type of internal validation, the problem of quantifying the misclassification also becomes a missing data problem as, by design, the true outcomes are only ascertained on a subset of the entire study sample. Although, the process of estimating misclassification probabilities appears simple conceptually, the estimation methods proposed so far have several methodological and practical shortcomings. Most methods rely on missing outcome data to be missing completely at random (MCAR), a rather stringent assumption which is unlikely to hold in practice. Some of the existing methods also tend to be computationally-intensive. To address these issues, we propose a computationally-efficient, easy-to-implement, pseudo-likelihood estimator of the misclassification probabilities under a missing at random (MAR) assumption, in studies with an available internal-validation sample. We present the estimator through the lens of studies with competing-risks outcomes, though the estimator extends beyond this setting. We describe the consistency and asymptotic distributional properties of the resulting estimator, and derive a closed-form estimator of its variance. The finite-sample performance of this estimator is evaluated via simulations. Using data from a real-world study with competing-risks outcomes, we illustrate how the proposed method can be used to estimate misclassification probabilities. We also show how the estimated misclassification probabilities can be used in an external study to adjust for possible misclassification bias when modeling cumulative incidence functions.  相似文献   

14.
A latent-class mixture model for incomplete longitudinal Gaussian data   总被引:2,自引:1,他引:1  
Summary .   In the analyses of incomplete longitudinal clinical trial data, there has been a shift, away from simple methods that are valid only if the data are missing completely at random, to more principled ignorable analyses, which are valid under the less restrictive missing at random assumption. The availability of the necessary standard statistical software nowadays allows for such analyses in practice. While the possibility of data missing not at random (MNAR) cannot be ruled out, it is argued that analyses valid under MNAR are not well suited for the primary analysis in clinical trials. Rather than either forgetting about or blindly shifting to an MNAR framework, the optimal place for MNAR analyses is within a sensitivity-analysis context. One such route for sensitivity analysis is to consider, next to selection models, pattern-mixture models or shared-parameter models. The latter can also be extended to a latent-class mixture model, the approach taken in this article. The performance of the so-obtained flexible model is assessed through simulations and the model is applied to data from a depression trial.  相似文献   

15.
Wang C  Daniels MJ 《Biometrics》2011,67(3):810-818
Summary Pattern mixture modeling is a popular approach for handling incomplete longitudinal data. Such models are not identifiable by construction. Identifying restrictions is one approach to mixture model identification ( Little, 1995 , Journal of the American Statistical Association 90 , 1112–1121; Little and Wang, 1996 , Biometrics 52 , 98–111; Thijs et al., 2002 , Biostatistics 3 , 245–265; Kenward, Molenberghs, and Thijs, 2003 , Biometrika 90 , 53–71; Daniels and Hogan, 2008 , in Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis) and is a natural starting point for missing not at random sensitivity analysis ( Thijs et al., 2002 , Biostatistics 3 , 245–265; Daniels and Hogan, 2008 , in Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis). However, when the pattern specific models are multivariate normal, identifying restrictions corresponding to missing at random (MAR) may not exist. Furthermore, identification strategies can be problematic in models with covariates (e.g., baseline covariates with time‐invariant coefficients). In this article, we explore conditions necessary for identifying restrictions that result in MAR to exist under a multivariate normality assumption and strategies for identifying sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with informative priors. In addition, we propose alternative modeling and sensitivity analysis strategies under a less restrictive assumption for the distribution of the observed response data. We adopt the deviance information criterion for model comparison and perform a simulation study to evaluate the performances of the different modeling approaches. We also apply the methods to a longitudinal clinical trial. Problems caused by baseline covariates with time‐invariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution.  相似文献   

16.
17.
Albert PS  Follmann DA  Wang SA  Suh EB 《Biometrics》2002,58(3):631-642
Longitudinal clinical trials often collect long sequences of binary data. Our application is a recent clinical trial in opiate addicts that examined the effect of a new treatment on repeated binary urine tests to assess opiate use over an extended follow-up. The dataset had two sources of missingness: dropout and intermittent missing observations. The primary endpoint of the study was comparing the marginal probability of a positive urine test over follow-up across treatment arms. We present a latent autoregressive model for longitudinal binary data subject to informative missingness. In this model, a Gaussian autoregressive process is shared between the binary response and missing-data processes, thereby inducing informative missingness. Our approach extends the work of others who have developed models that link the various processes through a shared random effect but do not allow for autocorrelation. We discuss parameter estimation using Monte Carlo EM and demonstrate through simulations that incorporating within-subject autocorrelation through a latent autoregressive process can be very important when longitudinal binary data is subject to informative missingness. We illustrate our new methodology using the opiate clinical trial data.  相似文献   

18.
Multiple imputation has become a widely accepted technique to deal with the problem of incomplete data. Typically, imputation of missing values and the statistical analysis are performed separately. Therefore, the imputation model has to be consistent with the analysis model. If the data are analyzed with a mixture model, the parameter estimates are usually obtained iteratively. Thus, if the data are missing not at random, parameter estimation and treatment of missingness should be combined. We solve both problems by simultaneously imputing values using the data augmentation method and estimating parameters using the EM algorithm. This iterative procedure ensures that the missing values are properly imputed given the current parameter estimates. Properties of the parameter estimates were investigated in a simulation study. The results are illustrated using data from the National Health and Nutrition Examination Survey.  相似文献   

19.
Summary .  In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach.  相似文献   

20.
Summary The generalized estimating equation (GEE) has been a popular tool for marginal regression analysis with longitudinal data, and its extension, the weighted GEE approach, can further accommodate data that are missing at random (MAR). Model selection methodologies for GEE, however, have not been systematically developed to allow for missing data. We propose the missing longitudinal information criterion (MLIC) for selection of the mean model, and the MLIC for correlation (MLICC) for selection of the correlation structure in GEE when the outcome data are subject to dropout/monotone missingness and are MAR. Our simulation results reveal that the MLIC and MLICC are effective for variable selection in the mean model and selecting the correlation structure, respectively. We also demonstrate the remarkable drawbacks of naively treating incomplete data as if they were complete and applying the existing GEE model selection method. The utility of proposed method is further illustrated by two real applications involving missing longitudinal outcome data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号