首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Summary In individually matched case–control studies, when some covariates are incomplete, an analysis based on the complete data may result in a large loss of information both in the missing and completely observed variables. This usually results in a bias and loss of efficiency. In this article, we propose a new method for handling the problem of missing covariate data based on a missing‐data‐induced intensity approach when the missingness mechanism does not depend on case–control status and show that this leads to a generalization of the missing indicator method. We derive the asymptotic properties of the estimates from the proposed method and, using an extensive simulation study, assess the finite sample performance in terms of bias, efficiency, and 95% confidence coverage under several missing data scenarios. We also make comparisons with complete‐case analysis (CCA) and some missing data methods that have been proposed previously. Our results indicate that, under the assumption of predictable missingness, the suggested method provides valid estimation of parameters, is more efficient than CCA, and is competitive with other, more complex methods of analysis. A case–control study of multiple myeloma risk and a polymorphism in the receptor Inter‐Leukin‐6 (IL‐6‐α) is used to illustrate our findings.  相似文献   

2.
Lee SM  Gee MJ  Hsieh SH 《Biometrics》2011,67(3):788-798
Summary We consider the estimation problem of a proportional odds model with missing covariates. Based on the validation and nonvalidation data sets, we propose a joint conditional method that is an extension of Wang et al. (2002, Statistica Sinica 12, 555–574). The proposed method is semiparametric since it requires neither an additional model for the missingness mechanism, nor the specification of the conditional distribution of missing covariates given observed variables. Under the assumption that the observed covariates and the surrogate variable are categorical, we derived the large sample property. The simulation studies show that in various situations, the joint conditional method is more efficient than the conditional estimation method and weighted method. We also use a real data set that came from a survey of cable TV satisfaction to illustrate the approaches.  相似文献   

3.
Summary.   The present article deals with informative missing (IM) exposure data in matched case–control studies. When the missingness mechanism depends on the unobserved exposure values, modeling the missing data mechanism is inevitable. Therefore, a full likelihood-based approach for handling IM data has been proposed by positing a model for selection probability, and a parametric model for the partially missing exposure variable among the control population along with a disease risk model. We develop an EM algorithm to estimate the model parameters. Three special cases: (a) binary exposure variable, (b) normally distributed exposure variable, and (c) lognormally distributed exposure variable are discussed in detail. The method is illustrated by analyzing a real matched case–control data with missing exposure variable. The performance of the proposed method is evaluated through simulation studies, and the robustness of the proposed method for violation of different types of model assumptions has been considered.  相似文献   

4.
The present article deals with informative missing (IM) exposure data in matched case-control studies. When the missingness mechanism depends on the unobserved exposure values, modeling the missing data mechanism is inevitable. Therefore, a full likelihood-based approach for handling IM data has been proposed by positing a model for selection probability, and a parametric model for the partially missing exposure variable among the control population along with a disease risk model. We develop an EM algorithm to estimate the model parameters. Three special cases: (a) binary exposure variable, (b) normally distributed exposure variable, and (c) lognormally distributed exposure variable are discussed in detail. The method is illustrated by analyzing a real matched case-control data with missing exposure variable. The performance of the proposed method is evaluated through simulation studies, and the robustness of the proposed method for violation of different types of model assumptions has been considered.  相似文献   

5.
给出协变量带有不可忽略缺失数据的非线性再生散度模型的Bayes方法,缺失数据机制由Logistic回归模型来确定.Gibbs抽样技术和Metropolis-Hastings算法(简称MH算法)用来得到模型参数、缺失数据机制中回归系数的联合Bayes估计,并用实例加以说明.  相似文献   

6.
Methods in the literature for missing covariate data in survival models have relied on the missing at random (MAR) assumption to render regression parameters identifiable. MAR means that missingness can depend on the observed exit time, and whether or not that exit is a failure or a censoring event. By considering ways in which missingness of covariate X could depend on the true but possibly censored failure time T and the true censoring time C, we attempt to identify missingness mechanisms which would yield MAR data. We find that, under various reasonable assumptions about how missingness might depend on T and/or C, additional strong assumptions are needed to obtain MAR. We conclude that MAR is difficult to justify in practical applications. One exception arises when missingness is independent of T, and C is independent of the value of the missing X. As alternatives to MAR, we propose two new missingness assumptions. In one, the missingness depends on T but not on C; in the other, the situation is reversed. For each, we show that the failure time model is identifiable. When missingness is independent of T, we show that the naive complete record analysis will yield a consistent estimator of the failure time distribution. When missingness is independent of C, we develop a complete record likelihood function and a corresponding estimator for parametric failure time models. We propose analyses to evaluate the plausibility of either assumption in a particular data set, and illustrate the ideas using data from the literature on this problem.  相似文献   

7.
In longitudinal studies investigators frequently have to assess and address potential biases introduced by missing data. New methods are proposed for modeling longitudinal categorical data with nonignorable dropout using marginalized transition models and shared random effects models. Random effects are introduced for both serial dependence of outcomes and nonignorable missingness. Fisher‐scoring and Quasi–Newton algorithms are developed for parameter estimation. Methods are illustrated with a real dataset.  相似文献   

8.
Toledano AY  Gatsonis C 《Biometrics》1999,55(2):488-496
We propose methods for regression analysis of repeatedly measured ordinal categorical data when there is nonmonotone missingness in these responses and when a key covariate is missing depending on observables. The methods use ordinal regression models in conjunction with generalized estimating equations (GEEs). We extend the GEE methodology to accommodate arbitrary patterns of missingness in the responses when this missingness is independent of the unobserved responses. We further extend the methodology to provide correction for possible bias when missingness in knowledge of a key covariate may depend on observables. The approach is illustrated with the analysis of data from a study in diagnostic oncology in which multiple correlated receiver operating characteristic curves are estimated and corrected for possible verification bias when the true disease status is missing depending on observables.  相似文献   

9.
We consider longitudinal studies in which the outcome observed over time is binary and the covariates of interest are categorical. With no missing responses or covariates, one specifies a multinomial model for the responses given the covariates and uses maximum likelihood to estimate the parameters. Unfortunately, incomplete data in the responses and covariates are a common occurrence in longitudinal studies. Here we assume the missing data are missing at random (Rubin, 1976, Biometrika 63, 581-592). Since all of the missing data (responses and covariates) are categorical, a useful technique for obtaining maximum likelihood parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). In using the EM algorithm with missing responses and covariates, one specifies the joint distribution of the responses and covariates. Here we consider the parameters of the covariate distribution as a nuisance. In data sets where the percentage of missing data is high, the estimates of the nuisance parameters can lead to highly unstable estimates of the parameters of interest. We propose a conditional model for the covariate distribution that has several modeling advantages for the EM algorithm and provides a reduction in the number of nuisance parameters, thus providing more stable estimates in finite samples.  相似文献   

10.
Generalized additive models (GAMs) have been widely used for flexible modeling of various types of outcomes. When the outcome in a GAM is subject to missing, practical analyses often assume that missingness is missing at random (MAR). This assumption can be of suspicion when the missingness is not by design. Evaluating the potential effects of alternative nonignorable missing data mechanism on the MAR inference from a GAM can be important but often challenging due to the complicatedness of alternative nonignorable models. We apply the index approach to local sensitivity (Troxel, Ma, and Heitjan 2004 (2004). Statistica Sinica 14 , 1221–1237) to evaluate the potential changes of the GAM estimates in the neighborhood of the MAR model. The approach avoids fitting any complicated nonignorable GAM. Only MAR estimates are required to calculate the resulting sensitivity index and adjust the GAM estimates to account for nonignorable missingness. Thus the proposed approach is considerably simpler to conduct, as compared with the alternative methods. The simulation study shows that the index provides valid assessment of the local sensitivity of the GAM estimates to nonignorable missingness. We then illustrate the method using a rheumatoid arthritis clinical trial data set.  相似文献   

11.
Cho Paik M 《Biometrics》2004,60(2):306-314
Matched case-control data analysis is often challenged by a missing covariate problem, the mishandling of which could cause bias or inefficiency. Satten and Carroll (2000, Biometrics56, 384-388) and other authors have proposed methods to handle missing covariates when the probability of missingness depends on the observed data, i.e., when data are missing at random. In this article, we propose a conditional likelihood method to handle the case when the probability of missingness depends on the unobserved covariate, i.e., when data are nonignorably missing. When the missing covariate is binary, the proposed method can be implemented using standard software. Using the Northern Manhattan Stroke Study data, we illustrate the method and discuss how sensitivity analysis can be conducted.  相似文献   

12.
We present a Bayesian approach to analyze matched "case-control" data with multiple disease states. The probability of disease development is described by a multinomial logistic regression model. The exposure distribution depends on the disease state and could vary across strata. In such a model, the number of stratum effect parameters grows in direct proportion to the sample size leading to inconsistent MLEs for the parameters of interest even when one uses a retrospective conditional likelihood. We adopt a semiparametric Bayesian framework instead, assuming a Dirichlet process prior with a mixing normal distribution on the distribution of the stratum effects. We also account for possible missingness in the exposure variable in our model. The actual estimation is carried out through a Markov chain Monte Carlo numerical integration scheme. The proposed methodology is illustrated through simulation and an example of a matched study on low birth weight of newborns (Hosmer, D. A. and Lemeshow, S., 2000, Applied Logistic Regression) with two possible disease groups matched with a control group.  相似文献   

13.
Data with missing covariate values but fully observed binary outcomes are an important subset of the missing data challenge. Common approaches are complete case analysis (CCA) and multiple imputation (MI). While CCA relies on missing completely at random (MCAR), MI usually relies on a missing at random (MAR) assumption to produce unbiased results. For MI involving logistic regression models, it is also important to consider several missing not at random (MNAR) conditions under which CCA is asymptotically unbiased and, as we show, MI is also valid in some cases. We use a data application and simulation study to compare the performance of several machine learning and parametric MI methods under a fully conditional specification framework (MI-FCS). Our simulation includes five scenarios involving MCAR, MAR, and MNAR under predictable and nonpredictable conditions, where “predictable” indicates missingness is not associated with the outcome. We build on previous results in the literature to show MI and CCA can both produce unbiased results under more conditions than some analysts may realize. When both approaches were valid, we found that MI-FCS was at least as good as CCA in terms of estimated bias and coverage, and was superior when missingness involved a categorical covariate. We also demonstrate how MNAR sensitivity analysis can build confidence that unbiased results were obtained, including under MNAR-predictable, when CCA and MI are both valid. Since the missingness mechanism cannot be identified from observed data, investigators should compare results from MI and CCA when both are plausibly valid, followed by MNAR sensitivity analysis.  相似文献   

14.
Habitats in the Wadden Sea, a world heritage area, are affected by land subsidence resulting from natural gas extraction and by sea level rise. Here we describe a method to monitor changes in habitat types by producing sequential maps based on point information followed by mapping using a multinomial logit regression model with abiotic variables of which maps are available as predictors.In a 70 ha study area a total of 904 vegetation samples has been collected in seven sampling rounds with an interval of 2–3 years. Half of the vegetation plots was permanent, violating the assumption of independent data in multinomial logistic regression. This paper shows how this dependency can be accounted for by adding a random effect to the multinomial logit (MLN) model, thus becoming a mixed multinomial logit (MMNL) model. In principle all regression coefficients can be taken as random, but in this study only the intercepts are treated as location-specific random variables (random intercepts model). With six habitat types we have five intercepts, so that the number of extra model parameters becomes 15, 5 variances and 10 covariances.The likelihood ratio test showed that the MMNL model fitted significantly better than the MNL model with the same fixed effects. McFadden-R2 for the MMNL model was 0.467, versus 0.395 for the MNL model. The estimated coefficients of the MMNL and MNL model were comparable; those of altitude, the most important predictor, differed most. The MMNL model accounts for pseudo-replication at the permanent plots, which explains the larger standard errors of the MMNL coefficients. The habitat type at a given location-year combination was predicted by the habitat type with the largest predicted probability. The series of maps shows local trends in habitat types most likely driven by sea-level rise, soil subsidence, and a restoration project.We conclude that in environmental modeling of categorical variables using panel data, dependency of repeated observations at permanent plots should be accounted for. This will affect the estimated probabilities of the categories, and even stronger the standard errors of the regression coefficients.  相似文献   

15.
Summary A class of nonignorable models is presented for handling nonmonotone missingness in categorical longitudinal responses. This class of models includes the traditional selection models and shared parameter models. This allows us to perform a broader than usual sensitivity analysis. In particular, instead of considering variations to a chosen nonignorable model, we study sensitivity between different missing data frameworks. An appealing feature of the developed class is that parameters with a marginal interpretation are obtained, while algebraically simple models are considered. Specifically, marginalized mixed‐effects models ( Heagerty, 1999 , Biometrics 55, 688–698) are used for the longitudinal process that model separately the marginal mean and the correlation structure. For the correlation structure, random effects are introduced and their distribution is modeled either parametrically or non‐parametrically to avoid potential misspecifications.  相似文献   

16.
Summary .  In this article, we study the estimation of mean response and regression coefficient in semiparametric regression problems when response variable is subject to nonrandom missingness. When the missingness is independent of the response conditional on high-dimensional auxiliary information, the parametric approach may misspecify the relationship between covariates and response while the nonparametric approach is infeasible because of the curse of dimensionality. To overcome this, we study a model-based approach to condense the auxiliary information and estimate the parameters of interest nonparametrically on the condensed covariate space. Our estimators possess the double robustness property, i.e., they are consistent whenever the model for the response given auxiliary covariates or the model for the missingness given auxiliary covariate is correct. We conduct a number of simulations to compare the numerical performance between our estimators and other existing estimators in the current missing data literature, including the propensity score approach and the inverse probability weighted estimating equation. A set of real data is used to illustrate our approach.  相似文献   

17.
Informative missingness of parental genotype data occurs when the genotype of a parent influences the probability of the parent's genotype data being observed. Informative missingness can occur in a number of plausible ways and can affect both the validity and power of procedures that assume the data are missing at random (MAR). We propose a bootstrap calibration of MAR procedures to account for informative missingness and apply our methodology to refine the approach implemented in the TRANSMIT program. We illustrate this approach by applying it to data on hypertensive probands and their parents who participated in the Framingham Heart Study.  相似文献   

18.
Albert PS  Follmann DA  Wang SA  Suh EB 《Biometrics》2002,58(3):631-642
Longitudinal clinical trials often collect long sequences of binary data. Our application is a recent clinical trial in opiate addicts that examined the effect of a new treatment on repeated binary urine tests to assess opiate use over an extended follow-up. The dataset had two sources of missingness: dropout and intermittent missing observations. The primary endpoint of the study was comparing the marginal probability of a positive urine test over follow-up across treatment arms. We present a latent autoregressive model for longitudinal binary data subject to informative missingness. In this model, a Gaussian autoregressive process is shared between the binary response and missing-data processes, thereby inducing informative missingness. Our approach extends the work of others who have developed models that link the various processes through a shared random effect but do not allow for autocorrelation. We discuss parameter estimation using Monte Carlo EM and demonstrate through simulations that incorporating within-subject autocorrelation through a latent autoregressive process can be very important when longitudinal binary data is subject to informative missingness. We illustrate our new methodology using the opiate clinical trial data.  相似文献   

19.
A statistical model for jointly analysing the spatial variation of incidences of three (or more) diseases, with common and uncommon risk factors, is introduced. Deaths for different diseases are described by a logit model for multinomial responses (multinomial logit or polytomous logit model). For each area and confounding strata population (i.e. age-class, sex, race) the probabilities of death for each cause (the response probabilities) are estimated. A specic disease, the one having a common risk factor only, acts as the baseline category. The log odds are decomposed additively into shared (common to diseases different by the reference disease) and specic structured spatial variability terms, unstructured unshared spatial terms and confounders terms (such as age, race and sex) to adjust the crude observed data for their effects. Disease specic spatially structured effects are estimated; these are considered as latent variables denoting disease-specic risk factors. The model is presented with reference to a specic application. We considered the mortality data (from 1990 to 1994) relative to oral cavity, larynx and lung cancers in 13 age groups of males, in the 287 municipalities of Region of Tuscany (Italy). All these pathologies share smoking as a common risk factor; furthermore, two of them (oral cavity and larynx cancer) share alcohol consumption as a risk factor. All studies suggest that smoking and alcohol consumption are the major known risk factors for oral cavity and larynx cancers; nevertheless, in this paper, we investigate the possibility of other different risk factors for these diseases, or even the presence of an interaction effect (between smoking and alcohol risk factors) but with different spatial patterns for oral and larynx cancer. For each municipality and age-class the probabilities of death for each cause (the response probabilities) are estimated. Lung cancer acts as the baseline category. The log odds are decomposed additively into shared (common to oral cavity and larynx diseases) and specic structured spatial variability terms, unstructured unshared spatial terms and an age-group term. It turns out that oral cavity and larynx cancer have different spatial patterns for residual risk factors which are not the typical ones such as smoking habits and alcohol consumption. But, possibly, these patterns are due to different spatial interactions between smoking habits and alcohol consumption for the first and the second disease.  相似文献   

20.
For regression with covariates missing not at random where the missingness depends on the missing covariate values, complete-case (CC) analysis leads to consistent estimation when the missingness is independent of the response given all covariates, but it may not have the desired level of efficiency. We propose a general empirical likelihood framework to improve estimation efficiency over the CC analysis. We expand on methods in Bartlett et al. (2014, Biostatistics 15 , 719–730) and Xie and Zhang (2017, Int J Biostat 13 , 1–20) that improve efficiency by modeling the missingness probability conditional on the response and fully observed covariates by allowing the possibility of modeling other data distribution-related quantities. We also give guidelines on what quantities to model and demonstrate that our proposal has the potential to yield smaller biases than existing methods when the missingness probability model is incorrect. Simulation studies are presented, as well as an application to data collected from the US National Health and Nutrition Examination Survey.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号