首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Hairu Wang  Zhiping Lu  Yukun Liu 《Biometrics》2023,79(2):1268-1279
Missing data are frequently encountered in various disciplines and can be divided into three categories: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). Valid statistical approaches to missing data depend crucially on correct identification of the underlying missingness mechanism. Although the problem of testing whether this mechanism is MCAR or MAR has been extensively studied, there has been very little research on testing MAR versus MNAR. A critical challenge that is faced when dealing with this problem is the issue of model identification under MNAR. In this paper, under a logistic model for the missing probability, we develop two score tests for the problem of whether the missingness mechanism is MAR or MNAR under a parametric model and a semiparametric location model on the regression function. The implementation of the score tests circumvents the identification issue as it requires only parameter estimation under the null MAR assumption. Our simulations and analysis of human immunodeficiency virus data show that the score tests have well-controlled type I errors and desirable powers.  相似文献   

2.
A latent-class mixture model for incomplete longitudinal Gaussian data   总被引:2,自引:1,他引:1  
Summary .   In the analyses of incomplete longitudinal clinical trial data, there has been a shift, away from simple methods that are valid only if the data are missing completely at random, to more principled ignorable analyses, which are valid under the less restrictive missing at random assumption. The availability of the necessary standard statistical software nowadays allows for such analyses in practice. While the possibility of data missing not at random (MNAR) cannot be ruled out, it is argued that analyses valid under MNAR are not well suited for the primary analysis in clinical trials. Rather than either forgetting about or blindly shifting to an MNAR framework, the optimal place for MNAR analyses is within a sensitivity-analysis context. One such route for sensitivity analysis is to consider, next to selection models, pattern-mixture models or shared-parameter models. The latter can also be extended to a latent-class mixture model, the approach taken in this article. The performance of the so-obtained flexible model is assessed through simulations and the model is applied to data from a depression trial.  相似文献   

3.
Summary In medical research, the receiver operating characteristic (ROC) curves can be used to evaluate the performance of biomarkers for diagnosing diseases or predicting the risk of developing a disease in the future. The area under the ROC curve (ROC AUC), as a summary measure of ROC curves, is widely utilized, especially when comparing multiple ROC curves. In observational studies, the estimation of the AUC is often complicated by the presence of missing biomarker values, which means that the existing estimators of the AUC are potentially biased. In this article, we develop robust statistical methods for estimating the ROC AUC and the proposed methods use information from auxiliary variables that are potentially predictive of the missingness of the biomarkers or the missing biomarker values. We are particularly interested in auxiliary variables that are predictive of the missing biomarker values. In the case of missing at random (MAR), that is, missingness of biomarker values only depends on the observed data, our estimators have the attractive feature of being consistent if one correctly specifies, conditional on auxiliary variables and disease status, either the model for the probabilities of being missing or the model for the biomarker values. In the case of missing not at random (MNAR), that is, missingness may depend on the unobserved biomarker values, we propose a sensitivity analysis to assess the impact of MNAR on the estimation of the ROC AUC. The asymptotic properties of the proposed estimators are studied and their finite‐sample behaviors are evaluated in simulation studies. The methods are further illustrated using data from a study of maternal depression during pregnancy.  相似文献   

4.
Chen B  Zhou XH 《Biometrics》2011,67(3):830-842
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.  相似文献   

5.
Wang C  Daniels MJ 《Biometrics》2011,67(3):810-818
Summary Pattern mixture modeling is a popular approach for handling incomplete longitudinal data. Such models are not identifiable by construction. Identifying restrictions is one approach to mixture model identification ( Little, 1995 , Journal of the American Statistical Association 90 , 1112–1121; Little and Wang, 1996 , Biometrics 52 , 98–111; Thijs et al., 2002 , Biostatistics 3 , 245–265; Kenward, Molenberghs, and Thijs, 2003 , Biometrika 90 , 53–71; Daniels and Hogan, 2008 , in Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis) and is a natural starting point for missing not at random sensitivity analysis ( Thijs et al., 2002 , Biostatistics 3 , 245–265; Daniels and Hogan, 2008 , in Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis). However, when the pattern specific models are multivariate normal, identifying restrictions corresponding to missing at random (MAR) may not exist. Furthermore, identification strategies can be problematic in models with covariates (e.g., baseline covariates with time‐invariant coefficients). In this article, we explore conditions necessary for identifying restrictions that result in MAR to exist under a multivariate normality assumption and strategies for identifying sensitivity parameters for sensitivity analysis or for a fully Bayesian analysis with informative priors. In addition, we propose alternative modeling and sensitivity analysis strategies under a less restrictive assumption for the distribution of the observed response data. We adopt the deviance information criterion for model comparison and perform a simulation study to evaluate the performances of the different modeling approaches. We also apply the methods to a longitudinal clinical trial. Problems caused by baseline covariates with time‐invariant coefficients are investigated and an alternative identifying restriction based on residuals is proposed as a solution.  相似文献   

6.
Generalized additive models (GAMs) have been widely used for flexible modeling of various types of outcomes. When the outcome in a GAM is subject to missing, practical analyses often assume that missingness is missing at random (MAR). This assumption can be of suspicion when the missingness is not by design. Evaluating the potential effects of alternative nonignorable missing data mechanism on the MAR inference from a GAM can be important but often challenging due to the complicatedness of alternative nonignorable models. We apply the index approach to local sensitivity (Troxel, Ma, and Heitjan 2004 (2004). Statistica Sinica 14 , 1221–1237) to evaluate the potential changes of the GAM estimates in the neighborhood of the MAR model. The approach avoids fitting any complicated nonignorable GAM. Only MAR estimates are required to calculate the resulting sensitivity index and adjust the GAM estimates to account for nonignorable missingness. Thus the proposed approach is considerably simpler to conduct, as compared with the alternative methods. The simulation study shows that the index provides valid assessment of the local sensitivity of the GAM estimates to nonignorable missingness. We then illustrate the method using a rheumatoid arthritis clinical trial data set.  相似文献   

7.
Methods in the literature for missing covariate data in survival models have relied on the missing at random (MAR) assumption to render regression parameters identifiable. MAR means that missingness can depend on the observed exit time, and whether or not that exit is a failure or a censoring event. By considering ways in which missingness of covariate X could depend on the true but possibly censored failure time T and the true censoring time C, we attempt to identify missingness mechanisms which would yield MAR data. We find that, under various reasonable assumptions about how missingness might depend on T and/or C, additional strong assumptions are needed to obtain MAR. We conclude that MAR is difficult to justify in practical applications. One exception arises when missingness is independent of T, and C is independent of the value of the missing X. As alternatives to MAR, we propose two new missingness assumptions. In one, the missingness depends on T but not on C; in the other, the situation is reversed. For each, we show that the failure time model is identifiable. When missingness is independent of T, we show that the naive complete record analysis will yield a consistent estimator of the failure time distribution. When missingness is independent of C, we develop a complete record likelihood function and a corresponding estimator for parametric failure time models. We propose analyses to evaluate the plausibility of either assumption in a particular data set, and illustrate the ideas using data from the literature on this problem.  相似文献   

8.
Data with missing covariate values but fully observed binary outcomes are an important subset of the missing data challenge. Common approaches are complete case analysis (CCA) and multiple imputation (MI). While CCA relies on missing completely at random (MCAR), MI usually relies on a missing at random (MAR) assumption to produce unbiased results. For MI involving logistic regression models, it is also important to consider several missing not at random (MNAR) conditions under which CCA is asymptotically unbiased and, as we show, MI is also valid in some cases. We use a data application and simulation study to compare the performance of several machine learning and parametric MI methods under a fully conditional specification framework (MI-FCS). Our simulation includes five scenarios involving MCAR, MAR, and MNAR under predictable and nonpredictable conditions, where “predictable” indicates missingness is not associated with the outcome. We build on previous results in the literature to show MI and CCA can both produce unbiased results under more conditions than some analysts may realize. When both approaches were valid, we found that MI-FCS was at least as good as CCA in terms of estimated bias and coverage, and was superior when missingness involved a categorical covariate. We also demonstrate how MNAR sensitivity analysis can build confidence that unbiased results were obtained, including under MNAR-predictable, when CCA and MI are both valid. Since the missingness mechanism cannot be identified from observed data, investigators should compare results from MI and CCA when both are plausibly valid, followed by MNAR sensitivity analysis.  相似文献   

9.
Summary The generalized estimating equation (GEE) has been a popular tool for marginal regression analysis with longitudinal data, and its extension, the weighted GEE approach, can further accommodate data that are missing at random (MAR). Model selection methodologies for GEE, however, have not been systematically developed to allow for missing data. We propose the missing longitudinal information criterion (MLIC) for selection of the mean model, and the MLIC for correlation (MLICC) for selection of the correlation structure in GEE when the outcome data are subject to dropout/monotone missingness and are MAR. Our simulation results reveal that the MLIC and MLICC are effective for variable selection in the mean model and selecting the correlation structure, respectively. We also demonstrate the remarkable drawbacks of naively treating incomplete data as if they were complete and applying the existing GEE model selection method. The utility of proposed method is further illustrated by two real applications involving missing longitudinal outcome data.  相似文献   

10.
Recently, instrumental variables methods have been used to address non-compliance in randomized experiments. Complicating such analyses is often the presence of missing data. The standard model for missing data, missing at random (MAR), has some unattractive features in this context. In this paper we compare MAR-based estimates of the complier average causal effect (CACE) with an estimator based on an alternative, nonignorable model for the missing data process, developed by Frangakis and Rubin (1999, Biometrika, 86, 365-379). We also introduce a new missing data model that, like the Frangakis-Rubin model, is specially suited for models with instrumental variables, but makes different substantive assumptions. We analyze these issues in the context of a randomized trial of breast self-examination (BSE). In the study two methods of teaching BSE, consisting of either mailed information about BSE (the standard treatment) or the attendance of a course involving theoretical and practical sessions (the new treatment), were compared with the aim of assessing whether teaching programs could increase BSE practice and improve examination skills. The study was affected by the two sources of bias mentioned above: only 55% of women assigned to receive the new treatment complied with their assignment and 35% of the women did not respond to the post-test questionnaire. Comparing the causal estimand of the new treatment using the MAR, Frangakis-Rubin, and our new approach, the results suggest that for these data the MAR assumption appears least plausible, and that the new model appears most plausible among the three choices.  相似文献   

11.
12.
The problem of dropout is a common one in longitudinal studies. One usually assumes for the analysis that dropout is at random. There are some tests to investigate this assumption. But these tests depend on normally distributed data or lack power, cf. Listing and Schlittgen (1998). We here propose an overall test which combines several Wilcoxon rank sum tests. The alternative hypothesis states that there is a tendency for larger (smaller) values of the target variable the last time the probands show up. The test is applicable with many ties also. It proves to perform well, compared to the test developed for normally distributed data, as well as to a test for completely missing at random which is proposed by Little (1988). An application to real data is given too.  相似文献   

13.
Summary .  Little and An (2004,  Statistica Sinica   14, 949–968) proposed a penalized spline of propensity prediction (PSPP) method of imputation of missing values that yields robust model-based inference under the missing at random assumption. The propensity score for a missing variable is estimated and a regression model is fitted that includes the spline of the estimated logit propensity score as a covariate. The predicted unconditional mean of the missing variable has a double robustness (DR) property under misspecification of the imputation model. We show that a simplified version of PSPP, which does not center other regressors prior to including them in the prediction model, also has the DR property. We also propose two extensions of PSPP, namely, stratified PSPP and bivariate PSPP, that extend the DR property to inferences about conditional means. These extended PSPP methods are compared with the PSPP method and simple alternatives in a simulation study and applied to an online weight loss study conducted by Kaiser Permanente.  相似文献   

14.
In this work, we fit pattern-mixture models to data sets with responses that are potentially missing not at random (MNAR, Little and Rubin, 1987). In estimating the regression parameters that are identifiable, we use the pseudo maximum likelihood method based on exponential families. This procedure provides consistent estimators when the mean structure is correctly specified for each pattern, with further information on the variance structure giving an efficient estimator. The proposed method can be used to handle a variety of continuous and discrete outcomes. A test built on this approach is also developed for model simplification in order to improve efficiency. Simulations are carried out to compare the proposed estimation procedure with other methods. In combination with sensitivity analysis, our approach can be used to fit parsimonious semi-parametric pattern-mixture models to outcomes that are potentially MNAR. We apply the proposed method to an epidemiologic cohort study to examine cognition decline among elderly.  相似文献   

15.
Models for longitudinal data are employed in a wide range of behavioral, biomedical, psychosocial, and health‐care‐related research. One popular model for continuous response is the linear mixed‐effects model (LMM). Although simulations by recent studies show that LMM provides reliable estimates under departures from the normality assumption for complete data, the invariable occurrence of missing data in practical studies renders such robustness results less useful when applied to real study data. In this paper, we show by simulated studies that in the presence of missing data estimates of the fixed effect of LMM are biased under departures from normality. We discuss two robust alternatives, the weighted generalized estimating equations (WGEE) and the augmented WGEE (AWGEE), and compare their performances with LMM using real as well as simulated data. Our simulation results show that both WGEE and AWGEE provide valid inference for skewed non‐normal data when missing data follows the missing at random, the most popular missing data mechanism for real study data.  相似文献   

16.
Satten GA  Carroll RJ 《Biometrics》2000,56(2):384-388
We consider methods for analyzing categorical regression models when some covariates (Z) are completely observed but other covariates (X) are missing for some subjects. When data on X are missing at random (i.e., when the probability that X is observed does not depend on the value of X itself), we present a likelihood approach for the observed data that allows the same nuisance parameters to be eliminated in a conditional analysis as when data are complete. An example of a matched case-control study is used to demonstrate our approach.  相似文献   

17.
GEE with Gaussian estimation of the correlations when data are incomplete   总被引:4,自引:0,他引:4  
This paper considers a modification of generalized estimating equations (GEE) for handling missing binary response data. The proposed method uses Gaussian estimation of the correlation parameters, i.e., the estimating function that yields an estimate of the correlation parameters is obtained from the multivariate normal likelihood. The proposed method yields consistent estimates of the regression parameters when data are missing completely at random (MCAR). However, when data are missing at random (MAR), consistency may not hold. In a simulation study with repeated binary outcomes that are missing at random, the magnitude of the potential bias that can arise is examined. The results of the simulation study indicate that, when the working correlation matrix is correctly specified, the bias is almost negligible for the modified GEE. In the simulation study, the proposed modification of GEE is also compared to the standard GEE, multiple imputation, and weighted estimating equations approaches. Finally, the proposed method is illustrated using data from a longitudinal clinical trial comparing two therapeutic treatments, zidovudine (AZT) and didanosine (ddI), in patients with HIV.  相似文献   

18.
We consider a conceptual correspondence between the missing data setting, and joint modeling of longitudinal and time‐to‐event outcomes. Based on this, we formulate an extended shared random effects joint model. Based on this, we provide a characterization of missing at random, which is in line with that in the missing data setting. The ideas are illustrated using data from a study on liver cirrhosis, contrasting the new framework with conventional joint models.  相似文献   

19.
Imputation, weighting, direct likelihood, and direct Bayesian inference (Rubin, 1976) are important approaches for missing data regression. Many useful semiparametric estimators have been developed for regression analysis of data with missing covariates or outcomes. It has been established that some semiparametric estimators are asymptotically equivalent, but it has not been shown that many are numerically the same. We applied some existing methods to a bladder cancer case-control study and noted that they were the same numerically when the observed covariates and outcomes are categorical. To understand the analytical background of this finding, we further show that when observed covariates and outcomes are categorical, some estimators are not only asymptotically equivalent but also actually numerically identical. That is, although their estimating equations are different, they lead numerically to exactly the same root. This includes a simple weighted estimator, an augmented weighted estimator, and a mean-score estimator. The numerical equivalence may elucidate the relationship between imputing scores and weighted estimation procedures.  相似文献   

20.
Summary .   Missing data, measurement error, and misclassification are three important problems in many research fields, such as epidemiological studies. It is well known that missing data and measurement error in covariates may lead to biased estimation. Misclassification may be considered as a special type of measurement error, for categorical data. Nevertheless, we treat misclassification as a different problem from measurement error because statistical models for them are different. Indeed, in the literature, methods for these three problems were generally proposed separately given that statistical modeling for them are very different. The problem is more challenging in a longitudinal study with nonignorable missing data. In this article, we consider estimation in generalized linear models under these three incomplete data models. We propose a general approach based on expected estimating equations (EEEs) to solve these three incomplete data problems in a unified fashion. This EEE approach can be easily implemented and its asymptotic covariance can be obtained by sandwich estimation. Intensive simulation studies are performed under various incomplete data settings. The proposed method is applied to a longitudinal study of oral bone density in relation to body bone density.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号