首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 586 毫秒
We propose a method to estimate the regression coefficients in a competing risks model where the cause-specific hazard for the cause of interest is related to covariates through a proportional hazards relationship and when cause of failure is missing for some individuals. We use multiple imputation procedures to impute missing cause of failure, where the probability that a missing cause is the cause of interest may depend on auxiliary covariates, and combine the maximum partial likelihood estimators computed from several imputed data sets into an estimator that is consistent and asymptotically normal. A consistent estimator for the asymptotic variance is also derived. Simulation results suggest the relevance of the theory in finite samples. Results are also illustrated with data from a breast cancer study.  相似文献   

Lee SM  Gee MJ  Hsieh SH 《Biometrics》2011,67(3):788-798
Summary We consider the estimation problem of a proportional odds model with missing covariates. Based on the validation and nonvalidation data sets, we propose a joint conditional method that is an extension of Wang et al. (2002, Statistica Sinica 12, 555–574). The proposed method is semiparametric since it requires neither an additional model for the missingness mechanism, nor the specification of the conditional distribution of missing covariates given observed variables. Under the assumption that the observed covariates and the surrogate variable are categorical, we derived the large sample property. The simulation studies show that in various situations, the joint conditional method is more efficient than the conditional estimation method and weighted method. We also use a real data set that came from a survey of cable TV satisfaction to illustrate the approaches.  相似文献   

Wang CY  Huang WT 《Biometrics》2000,56(1):98-105
We consider estimation in logistic regression where some covariate variables may be missing at random. Satten and Kupper (1993, Journal of the American Statistical Association 88, 200-208) proposed estimating odds ratio parameters using methods based on the probability of exposure. By approximating a partial likelihood, we extend their idea and propose a method that estimates the cumulant-generating function of the missing covariate given observed covariates and surrogates in the controls. Our proposed method first estimates some lower order cumulants of the conditional distribution of the unobserved data and then solves a resulting estimating equation for the logistic regression parameter. A simple version of the proposed method is to replace a missing covariate by the summation of its conditional mean and conditional variance given observed data in the controls. We note that one important property of the proposed method is that, when the validation is only on controls, a class of inverse selection probability weighted semiparametric estimators cannot be applied because selection probabilities on cases are zeroes. The proposed estimator performs well unless the relative risk parameters are large, even though it is technically inconsistent. Small-sample simulations are conducted. We illustrate the method by an example of real data analysis.  相似文献   

Satten GA  Carroll RJ 《Biometrics》2000,56(2):384-388
We consider methods for analyzing categorical regression models when some covariates (Z) are completely observed but other covariates (X) are missing for some subjects. When data on X are missing at random (i.e., when the probability that X is observed does not depend on the value of X itself), we present a likelihood approach for the observed data that allows the same nuisance parameters to be eliminated in a conditional analysis as when data are complete. An example of a matched case-control study is used to demonstrate our approach.  相似文献   

Interval-censored failure-time data arise when subjects miss prescheduled visits at which the failure is to be assessed. The resulting intervals in which the failure is known to have occurred are overlapping. Most approaches to the analysis of these data assume that the visit-compliance process is ignorable with respect to likelihood analysis of the failure-time distribution. While this assumption offers considerable simplification, it is not always plausible. Here we test for dependence between the failure- and visit-compliance processes, applicable to studies in which data collection continues after the occurrence of the failure. We do not make any of the assumptions made by previous authors about the joint distribution of the visit-compliance process, a covariate process, and the failure time. Instead, we consider conditional models of the true failure history given the current visit compliance at each visit time, allowing for correlation across visit times. Because failure status is not known at some visit times due to missed visits, only models of the observed failure history given current visit compliance are estimable. We describe how the parameters from these models can be used to test for a negative association and how bounds on unestimable parameters provided by the observed data are needed additionally to infer a positive association. We illustrate the method with data from an AIDS study and we investigate the power of the test through a simulation study.  相似文献   

Methods in the literature for missing covariate data in survival models have relied on the missing at random (MAR) assumption to render regression parameters identifiable. MAR means that missingness can depend on the observed exit time, and whether or not that exit is a failure or a censoring event. By considering ways in which missingness of covariate X could depend on the true but possibly censored failure time T and the true censoring time C, we attempt to identify missingness mechanisms which would yield MAR data. We find that, under various reasonable assumptions about how missingness might depend on T and/or C, additional strong assumptions are needed to obtain MAR. We conclude that MAR is difficult to justify in practical applications. One exception arises when missingness is independent of T, and C is independent of the value of the missing X. As alternatives to MAR, we propose two new missingness assumptions. In one, the missingness depends on T but not on C; in the other, the situation is reversed. For each, we show that the failure time model is identifiable. When missingness is independent of T, we show that the naive complete record analysis will yield a consistent estimator of the failure time distribution. When missingness is independent of C, we develop a complete record likelihood function and a corresponding estimator for parametric failure time models. We propose analyses to evaluate the plausibility of either assumption in a particular data set, and illustrate the ideas using data from the literature on this problem.  相似文献   

We consider longitudinal studies in which the outcome observed over time is binary and the covariates of interest are categorical. With no missing responses or covariates, one specifies a multinomial model for the responses given the covariates and uses maximum likelihood to estimate the parameters. Unfortunately, incomplete data in the responses and covariates are a common occurrence in longitudinal studies. Here we assume the missing data are missing at random (Rubin, 1976, Biometrika 63, 581-592). Since all of the missing data (responses and covariates) are categorical, a useful technique for obtaining maximum likelihood parameter estimates is the EM algorithm by the method of weights proposed in Ibrahim (1990, Journal of the American Statistical Association 85, 765-769). In using the EM algorithm with missing responses and covariates, one specifies the joint distribution of the responses and covariates. Here we consider the parameters of the covariate distribution as a nuisance. In data sets where the percentage of missing data is high, the estimates of the nuisance parameters can lead to highly unstable estimates of the parameters of interest. We propose a conditional model for the covariate distribution that has several modeling advantages for the EM algorithm and provides a reduction in the number of nuisance parameters, thus providing more stable estimates in finite samples.  相似文献   

We consider studies of cohorts of individuals after a critical event, such as an injury, with the following characteristics. First, the studies are designed to measure "input" variables, which describe the period before the critical event, and to characterize the distribution of the input variables in the cohort. Second, the studies are designed to measure "output" variables, primarily mortality after the critical event, and to characterize the predictive (conditional) distribution of mortality given the input variables in the cohort. Such studies often possess the complication that the input data are missing for those who die shortly after the critical event because the data collection takes place after the event. Standard methods of dealing with the missing inputs, such as imputation or weighting methods based on an assumption of ignorable missingness, are known to be generally invalid when the missingness of inputs is nonignorable, that is, when the distribution of the inputs is different between those who die and those who live. To address this issue, we propose a novel design that obtains and uses information on an additional key variable-a treatment or externally controlled variable, which if set at its "effective" level, could have prevented the death of those who died. We show that the new design can be used to draw valid inferences for the marginal distribution of inputs in the entire cohort, and for the conditional distribution of mortality given the inputs, also in the entire cohort, even under nonignorable missingness. The crucial framework that we use is principal stratification based on the potential outcomes, here mortality under both levels of treatment. We also show using illustrative preliminary injury data that our approach can reveal results that are more reasonable than the results of standard methods, in relatively dramatic ways. Thus, our approach suggests that the routine collection of data on variables that could be used as possible treatments in such studies of inputs and mortality should become common.  相似文献   

Pan W  Zeng D 《Biometrics》2011,67(3):996-1006
We study the estimation of mean medical cost when censoring is dependent and a large amount of auxiliary information is present. Under missing at random assumption, we propose semiparametric working models to obtain low-dimensional summarized scores. An estimator for the mean total cost can be derived nonparametrically conditional on the summarized scores. We show that when either the two working models for cost-survival process or the model for censoring distribution is correct, the estimator is consistent and asymptotically normal. Small-sample performance of the proposed method is evaluated via simulation studies. Finally, our approach is applied to analyze a real data set in health economics.  相似文献   

Cho Paik M 《Biometrics》2004,60(2):306-314
Matched case-control data analysis is often challenged by a missing covariate problem, the mishandling of which could cause bias or inefficiency. Satten and Carroll (2000, Biometrics56, 384-388) and other authors have proposed methods to handle missing covariates when the probability of missingness depends on the observed data, i.e., when data are missing at random. In this article, we propose a conditional likelihood method to handle the case when the probability of missingness depends on the unobserved covariate, i.e., when data are nonignorably missing. When the missing covariate is binary, the proposed method can be implemented using standard software. Using the Northern Manhattan Stroke Study data, we illustrate the method and discuss how sensitivity analysis can be conducted.  相似文献   

Horton NJ  Laird NM 《Biometrics》2001,57(1):34-42
This article presents a new method for maximum likelihood estimation of logistic regression models with incomplete covariate data where auxiliary information is available. This auxiliary information is extraneous to the regression model of interest but predictive of the covariate with missing data. Ibrahim (1990, Journal of the American Statistical Association 85, 765-769) provides a general method for estimating generalized linear regression models with missing covariates using the EM algorithm that is easily implemented when there is no auxiliary data. Vach (1997, Statistics in Medicine 16, 57-72) describes how the method can be extended when the outcome and auxiliary data are conditionally independent given the covariates in the model. The method allows the incorporation of auxiliary data without making the conditional independence assumption. We suggest tests of conditional independence and compare the performance of several estimators in an example concerning mental health service utilization in children. Using an artificial dataset, we compare the performance of several estimators when auxiliary data are available.  相似文献   

This paper develops methodology for estimation of the effect of a binary time-varying covariate on failure times when the change time of the covariate is interval censored. The motivating example is a study of cytomegalovirus (CMV) disease in patients with human immunodeficiency virus (HIV) disease. We are interested in determining whether CMV shedding predicts an increased hazard for developing active CMV disease. Since a clinical screening test is needed to detect CMV shedding, the time that shedding begins is only known to lie in an interval bounded by the patient's last negative and first positive tests. In a Cox proportional hazards model with a time-varying covariate for CMV shedding, the partial likelihood depends on the covariate status of every individual in the risk set at each failure time. Due to interval censoring, this is not always known. To solve this problem, we use a Monte Carlo EM algorithm with a Gibbs sampler embedded in the E-step. We generate multiple completed data sets by drawing imputed exact shedding times based on the joint likelihood of the shedding times and event times under the Cox model. The method is evaluated using a simulation study and is applied to the data set described above.  相似文献   

Toledano AY  Gatsonis C 《Biometrics》1999,55(2):488-496
We propose methods for regression analysis of repeatedly measured ordinal categorical data when there is nonmonotone missingness in these responses and when a key covariate is missing depending on observables. The methods use ordinal regression models in conjunction with generalized estimating equations (GEEs). We extend the GEE methodology to accommodate arbitrary patterns of missingness in the responses when this missingness is independent of the unobserved responses. We further extend the methodology to provide correction for possible bias when missingness in knowledge of a key covariate may depend on observables. The approach is illustrated with the analysis of data from a study in diagnostic oncology in which multiple correlated receiver operating characteristic curves are estimated and corrected for possible verification bias when the true disease status is missing depending on observables.  相似文献   

Summary Randomized experiments are the gold standard for evaluating proposed treatments. The intent to treat estimand measures the effect of treatment assignment, but not the effect of treatment if subjects take treatments to which they are not assigned. The desire to estimate the efficacy of the treatment in this case has been the impetus for a substantial literature on compliance over the last 15 years. In papers dealing with this issue, it is typically assumed there are different types of subjects, for example, those who will follow treatment assignment (compliers), and those who will always take a particular treatment irrespective of treatment assignment. The estimands of primary interest are the complier proportion and the complier average treatment effect (CACE). To estimate CACE, researchers have used various methods, for example, instrumental variables and parametric mixture models, treating compliers as a single class. However, it is often unreasonable to believe all compliers will be affected. This article therefore treats compliers as a mixture of two types, those belonging to a zero‐effect class, others to an effect class. Second, in most experiments, some subjects drop out or simply do not report the value of the outcome variable, and the failure to take into account missing data can lead to biased estimates of treatment effects. Recent work on compliance in randomized experiments has addressed this issue by assuming missing data are missing at random or latently ignorable. We extend this work to the case where compliers are a mixture of types and also examine alternative types of nonignorable missing data assumptions.  相似文献   

Chen B  Zhou XH 《Biometrics》2011,67(3):830-842
Longitudinal studies often feature incomplete response and covariate data. Likelihood-based methods such as the expectation-maximization algorithm give consistent estimators for model parameters when data are missing at random (MAR) provided that the response model and the missing covariate model are correctly specified; however, we do not need to specify the missing data mechanism. An alternative method is the weighted estimating equation, which gives consistent estimators if the missing data and response models are correctly specified; however, we do not need to specify the distribution of the covariates that have missing values. In this article, we develop a doubly robust estimation method for longitudinal data with missing response and missing covariate when data are MAR. This method is appealing in that it can provide consistent estimators if either the missing data model or the missing covariate model is correctly specified. Simulation studies demonstrate that this method performs well in a variety of situations.  相似文献   

Cook RJ  Ng ET  Meade MO 《Biometrics》2000,56(4):1109-1117
We describe a method for making inferences about the joint operating characteristics of multiple diagnostic tests applied longitudinally and in the absence of a definitive reference test. Log-linear models are adopted for the classification distributions conditional on the latent state, where inclusion of appropriate interaction terms accommodates conditional dependencies among the tests. A marginal likelihood is constructed by marginalizing over a latent two-state Markov process. Specific latent processes we consider include a first-order Markov model, a second-order Markov model, and a time-nonhomogeneous Markov model, although the method is described in full generality. Adaptations to handle missing data are described. Model diagnostics are considered based on the bootstrap distribution of conditional residuals. The methods are illustrated by application to a study of diffuse bilateral infiltrates among patients in intensive care wards in which the objective was to assess aspects of validity and clinical agreement.  相似文献   

Multiple imputation (MI) has emerged in the last two decades as a frequently used approach in dealing with incomplete data. Gaussian and log‐linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings that include a mix of continuous and discrete variables, the lack of flexible models for the joint distribution of different types of variables can make the specification of the imputation model a daunting task. The widespread availability of software packages that are capable of carrying out MI under the assumption of joint multivariate normality allows applied researchers to address this complication pragmatically by treating the discrete variables as continuous for imputation purposes and subsequently rounding the imputed values to the nearest observed category. In this article, we compare several rounding rules for binary variables based on simulated longitudinal data sets that have been used to illustrate other missing‐data techniques. Using a combination of conditional and marginal data generation mechanisms and imputation models, we study the statistical properties of multiple‐imputation‐based estimates for various population quantities under different rounding rules from bias and coverage standpoints. We conclude that a good rule should be driven by borrowing information from other variables in the system rather than relying on the marginal characteristics and should be relatively insensitive to imputation model specifications that may potentially be incompatible with the observed data. We also urge researchers to consider the applied context and specific nature of the problem, to avoid uncritical and possibly inappropriate use of rounding in imputation models.  相似文献   

GOETGHEBEUR  ELS; RYAN  LOUISE 《Biometrika》1995,82(4):821-833
We propose a method to analyse competing risks survival datawhen failure types are missing for some individuals. Our approachis based on a standard proportional hazards structure for eachof the failure types, and involves the solution to estimatingequations. We present consistent and asymptotically normal estimatorsof the regression coefficients and related score tests. An appealingfeature is that individuals with known failure types make thesame contributions as they would to a standard proportionalhazards analysis. Contributions of individuals with unknownfailure types are weighted according to the probability thatthey failed from the cause of interest. Efficiency and robustnessare discussed. Results are illustrated with data from a breastcancer trial.  相似文献   

Analysis of failure time data with dependent interval censoring   总被引:1,自引:0,他引:1  
This article develops a method for the analysis of screening data for which the chance of being screened is dependent on the event of interest (informative censoring). Because not all subjects make all screening visits, the data on the failure of interest is interval censored. We propose a model that will properly adjust for the dependence to obtain an unbiased estimate of the nonparametric failure time function, and we provide an extension for applying the method for estimation of the regression parameters from a (discrete time) proportional hazards regression model. The method is applied on a data set from an observational study of cytomegalovirus shedding in a population of HIV-infected subjects who participated in a trial conducted by the AIDS Clinical Trials Group.  相似文献   

In the context of parentage assignment using genomic markers, key issues are genotyping errors and an absence of parent genotypes because of sampling, traceability or genotyping problems. Most likelihood‐based parentage assignment software programs require a priori estimates of genotyping errors and the proportion of missing parents to set up meaningful assignment decision rules. We present here the R package APIS, which can assign offspring to their parents without any prior information other than the offspring and parental genotypes, and a user‐defined, acceptable error rate among assigned offspring. Assignment decision rules use the distributions of average Mendelian transmission probabilities, which enable estimates of the proportion of offspring with missing parental genotypes. APIS has been compared to other software (CERVUS, VITASSIGN), on a real European seabass (Dicentrarchus labrax) single nucleotide polymorphism data set. The type I error rate (false positives) was lower with APIS than with other software, especially when parental genotypes were missing, but the true positive rate was also lower, except when the theoretical exclusion power reached 0.99999. In general, APIS provided assignments that satisfied the user‐set acceptable error rate of 1% or 5%, even when tested on simulated data with high genotyping error rates (1% or 3%) and up to 50% missing sires. Because it uses the observed distribution of Mendelian transmission probabilities, APIS is best suited to assigning parentage when numerous offspring (>200) are genotyped. We have demonstrated that APIS is an easy‐to‐use and reliable software for parentage assignment, even when up to 50% of sires are missing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号