首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
There is a growing interest in the analysis of survival data with a cured proportion particularly in tumor recurrences studies. Biologically, it is reasonable to assume that the recurrence time is mainly affected by the overall health condition of the patient that depends on some covariates such as age, sex, or treatment type received. We propose a semiparametric frailty‐Cox cure model to quantify the overall health condition of the patient by a covariate‐dependent frailty that has a discrete mass at zero to characterize the cured patients, and a positive continuous part to characterize the heterogeneous health conditions among the uncured patients. A multiple imputation estimation method is proposed for the right‐censored case, which is further extended to accommodate interval‐censored data. Simulation studies show that the performance of the proposed method is highly satisfactory. For illustration, the model is fitted to a set of right‐censored melanoma incidence data and a set of interval‐censored breast cosmesis data. Our analysis suggests that patients receiving treatment of radiotherapy with adjuvant chemotherapy have a significantly higher probability of breast retraction, but also a lower hazard rate of breast retraction among those patients who will eventually experience the event with similar health conditions. The interpretation is very different to those based on models without a cure component that the treatment of radiotherapy with adjuvant chemotherapy significantly increases the risk of breast retraction.  相似文献   

2.
Summary Accurately assessing a patient’s risk of a given event is essential in making informed treatment decisions. One approach is to stratify patients into two or more distinct risk groups with respect to a specific outcome using both clinical and demographic variables. Outcomes may be categorical or continuous in nature; important examples in cancer studies might include level of toxicity or time to recurrence. Recursive partitioning methods are ideal for building such risk groups. Two such methods are Classification and Regression Trees (CART) and a more recent competitor known as the partitioning Deletion/Substitution/Addition (partDSA) algorithm, both of which also utilize loss functions (e.g., squared error for a continuous outcome) as the basis for building, selecting, and assessing predictors but differ in the manner by which regression trees are constructed. Recently, we have shown that partDSA often outperforms CART in so‐called “full data” settings (e.g., uncensored outcomes). However, when confronted with censored outcome data, the loss functions used by both procedures must be modified. There have been several attempts to adapt CART for right‐censored data. This article describes two such extensions for partDSA that make use of observed data loss functions constructed using inverse probability of censoring weights. Such loss functions are consistent estimates of their uncensored counterparts provided that the corresponding censoring model is correctly specified. The relative performance of these new methods is evaluated via simulation studies and illustrated through an analysis of clinical trial data on brain cancer patients. The implementation of partDSA for uncensored and right‐censored outcomes is publicly available in the R package, partDSA .  相似文献   

3.
Summary In a typical randomized clinical trial, a continuous variable of interest (e.g., bone density) is measured at baseline and fixed postbaseline time points. The resulting longitudinal data, often incomplete due to dropouts and other reasons, are commonly analyzed using parametric likelihood‐based methods that assume multivariate normality of the response vector. If the normality assumption is deemed untenable, then semiparametric methods such as (weighted) generalized estimating equations are considered. We propose an alternate approach in which the missing data problem is tackled using multiple imputation, and each imputed dataset is analyzed using robust regression (M‐estimation; Huber, 1973 , Annals of Statistics 1, 799–821.) to protect against potential non‐normality/outliers in the original or imputed dataset. The robust analysis results from each imputed dataset are combined for overall estimation and inference using either the simple Rubin (1987 , Multiple Imputation for Nonresponse in Surveys, New York: Wiley) method, or the more complex but potentially more accurate Robins and Wang (2000 , Biometrika 87, 113–124.) method. We use simulations to show that our proposed approach performs at least as well as the standard methods under normality, but is notably better under both elliptically symmetric and asymmetric non‐normal distributions. A clinical trial example is used for illustration.  相似文献   

4.
We develop an approach, based on multiple imputation, to using auxiliary variables to recover information from censored observations in survival analysis. We apply the approach to data from an AIDS clinical trial comparing ZDV and placebo, in which CD4 count is the time-dependent auxiliary variable. To facilitate imputation, a joint model is developed for the data, which includes a hierarchical change-point model for CD4 counts and a time-dependent proportional hazards model for the time to AIDS. Markov chain Monte Carlo methods are used to multiply impute event times for censored cases. The augmented data are then analyzed and the results combined using standard multiple-imputation techniques. A comparison of our multiple-imputation approach to simply analyzing the observed data indicates that multiple imputation leads to a small change in the estimated effect of ZDV and smaller estimated standard errors. A sensitivity analysis suggests that the qualitative findings are reproducible under a variety of imputation models. A simulation study indicates that improved efficiency over standard analyses and partial corrections for dependent censoring can result. An issue that arises with our approach, however, is whether the analysis of primary interest and the imputation model are compatible.  相似文献   

5.
Sternberg MR  Satten GA 《Biometrics》1999,55(2):514-522
Chain-of-events data are longitudinal observations on a succession of events that can only occur in a prescribed order. One goal in an analysis of this type of data is to determine the distribution of times between the successive events. This is difficult when individuals are observed periodically rather than continuously because the event times are then interval censored. Chain-of-events data may also be subject to truncation when individuals can only be observed if a certain event in the chain (e.g., the final event) has occurred. We provide a nonparametric approach to estimate the distributions of times between successive events in discrete time for data such as these under the semi-Markov assumption that the times between events are independent. This method uses a self-consistency algorithm that extends Turnbull's algorithm (1976, Journal of the Royal Statistical Society, Series B 38, 290-295). The quantities required to carry out the algorithm can be calculated recursively for improved computational efficiency. Two examples using data from studies involving HIV disease are used to illustrate our methods.  相似文献   

6.
In cohort studies the outcome is often time to a particular event, and subjects are followed at regular intervals. Periodic visits may also monitor a secondary irreversible event influencing the event of primary interest, and a significant proportion of subjects develop the secondary event over the period of follow‐up. The status of the secondary event serves as a time‐varying covariate, but is recorded only at the times of the scheduled visits, generating incomplete time‐varying covariates. While information on a typical time‐varying covariate is missing for entire follow‐up period except the visiting times, the status of the secondary event are unavailable only between visits where the status has changed, thus interval‐censored. One may view interval‐censored covariate of the secondary event status as missing time‐varying covariates, yet missingness is partial since partial information is provided throughout the follow‐up period. Current practice of using the latest observed status produces biased estimators, and the existing missing covariate techniques cannot accommodate the special feature of missingness due to interval censoring. To handle interval‐censored covariates in the Cox proportional hazards model, we propose an available‐data estimator, a doubly robust‐type estimator as well as the maximum likelihood estimator via EM algorithm and present their asymptotic properties. We also present practical approaches that are valid. We demonstrate the proposed methods using our motivating example from the Northern Manhattan Study.  相似文献   

7.
This paper deals with a Cox proportional hazards regression model, where some covariates of interest are randomly right‐censored. While methods for censored outcomes have become ubiquitous in the literature, methods for censored covariates have thus far received little attention and, for the most part, dealt with the issue of limit‐of‐detection. For randomly censored covariates, an often‐used method is the inefficient complete‐case analysis (CCA) which consists in deleting censored observations in the data analysis. When censoring is not completely independent, the CCA leads to biased and spurious results. Methods for missing covariate data, including type I and type II covariate censoring as well as limit‐of‐detection do not readily apply due to the fundamentally different nature of randomly censored covariates. We develop a novel method for censored covariates using a conditional mean imputation based on either Kaplan–Meier estimates or a Cox proportional hazards model to estimate the effects of these covariates on a time‐to‐event outcome. We evaluate the performance of the proposed method through simulation studies and show that it provides good bias reduction and statistical efficiency. Finally, we illustrate the method using data from the Framingham Heart Study to assess the relationship between offspring and parental age of onset of cardiovascular events.  相似文献   

8.
Summary Often a binary variable is generated by dichotomizing an underlying continuous variable measured at a specific time point according to a prespecified threshold value. In the event that the underlying continuous measurements are from a longitudinal study, one can use the repeated‐measures model to impute missing data on responder status as a result of subject dropout and apply the logistic regression model on the observed or otherwise imputed responder status. Standard Bayesian multiple imputation techniques ( Rubin, 1987 , in Multiple Imputation for Nonresponse in Surveys) that draw the parameters for the imputation model from the posterior distribution and construct the variance of parameter estimates for the analysis model as a combination of within‐ and between‐imputation variances are found to be conservative. The frequentist multiple imputation approach that fixes the parameters for the imputation model at the maximum likelihood estimates and construct the variance of parameter estimates for the analysis model using the results of Robins and Wang (2000, Biometrika 87, 113–124) is shown to be more efficient. We propose to apply ( Kenward and Roger, 1997 , Biometrics 53, 983–997) degrees of freedom to account for the uncertainty associated with variance–covariance parameter estimates for the repeated measures model.  相似文献   

9.
Interval‐censored recurrent event data arise when the event of interest is not readily observed but the cumulative event count can be recorded at periodic assessment times. In some settings, chronic disease processes may resolve, and individuals will cease to be at risk of events at the time of disease resolution. We develop an expectation‐maximization algorithm for fitting a dynamic mover‐stayer model to interval‐censored recurrent event data under a Markov model with a piecewise‐constant baseline rate function given a latent process. The model is motivated by settings in which the event times and the resolution time of the disease process are unobserved. The likelihood and algorithm are shown to yield estimators with small empirical bias in simulation studies. Data are analyzed on the cumulative number of damaged joints in patients with psoriatic arthritis where individuals experience disease remission.  相似文献   

10.
We present a method to fit a mixed effects Cox model with interval‐censored data. Our proposal is based on a multiple imputation approach that uses the truncated Weibull distribution to replace the interval‐censored data by imputed survival times and then uses established mixed effects Cox methods for right‐censored data. Interval‐censored data were encountered in a database corresponding to a recompilation of retrospective data from eight analytical treatment interruption (ATI) studies in 158 human immunodeficiency virus (HIV) positive combination antiretroviral treatment (cART) suppressed individuals. The main variable of interest is the time to viral rebound, which is defined as the increase of serum viral load (VL) to detectable levels in a patient with previously undetectable VL, as a consequence of the interruption of cART. Another aspect of interest of the analysis is to consider the fact that the data come from different studies based on different grounds and that we have several assessments on the same patient. In order to handle this extra variability, we frame the problem into a mixed effects Cox model that considers a random intercept per subject as well as correlated random intercept and slope for pre‐cART VL per study. Our procedure has been implemented in R using two packages: truncdist and coxme , and can be applied to any data set that presents both interval‐censored survival times and a grouped data structure that could be treated as a random effect in a regression model. The properties of the parameter estimators obtained with our proposed method are addressed through a simulation study.  相似文献   

11.
Multiple imputation (MI) is used to handle missing at random (MAR) data. Despite warnings from statisticians, continuous variables are often recoded into binary variables. With MI it is important that the imputation and analysis models are compatible; variables should be imputed in the same form they appear in the analysis model. With an encoded binary variable more accurate imputations may be obtained by imputing the underlying continuous variable. We conducted a simulation study to explore how best to impute a binary variable that was created from an underlying continuous variable. We generated a completely observed continuous outcome associated with an incomplete binary covariate that is a categorized version of an underlying continuous covariate, and an auxiliary variable associated with the underlying continuous covariate. We simulated data with several sample sizes, and set 25% and 50% of data in the covariate to MAR dependent on the outcome and the auxiliary variable. We compared the performance of five different imputation methods: (a) Imputation of the binary variable using logistic regression; (b) imputation of the continuous variable using linear regression, then categorizing into the binary variable; (c, d) imputation of both the continuous and binary variables using fully conditional specification (FCS) and multivariate normal imputation; (e) substantive-model compatible (SMC) FCS. Bias and standard errors were large when the continuous variable only was imputed. The other methods performed adequately. Imputation of both the binary and continuous variables using FCS often encountered mathematical difficulties. We recommend the SMC-FCS method as it performed best in our simulation studies.  相似文献   

12.
Multiple imputation (MI) has emerged in the last two decades as a frequently used approach in dealing with incomplete data. Gaussian and log‐linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings that include a mix of continuous and discrete variables, the lack of flexible models for the joint distribution of different types of variables can make the specification of the imputation model a daunting task. The widespread availability of software packages that are capable of carrying out MI under the assumption of joint multivariate normality allows applied researchers to address this complication pragmatically by treating the discrete variables as continuous for imputation purposes and subsequently rounding the imputed values to the nearest observed category. In this article, we compare several rounding rules for binary variables based on simulated longitudinal data sets that have been used to illustrate other missing‐data techniques. Using a combination of conditional and marginal data generation mechanisms and imputation models, we study the statistical properties of multiple‐imputation‐based estimates for various population quantities under different rounding rules from bias and coverage standpoints. We conclude that a good rule should be driven by borrowing information from other variables in the system rather than relying on the marginal characteristics and should be relatively insensitive to imputation model specifications that may potentially be incompatible with the observed data. We also urge researchers to consider the applied context and specific nature of the problem, to avoid uncritical and possibly inappropriate use of rounding in imputation models.  相似文献   

13.
Field ornithologists have used traditional culture‐based techniques to determine the presence and abundance of microbes on surfaces such as eggshells, but culture‐independent PCR‐based methods have recently been introduced. We compared the traditional culture‐based and the real‐time PCR‐based methods for detecting and quantifying Escherichia coli on the eggshells of Eurasian Magpies (Pica pica). PCR estimates of bacterial abundance were ~10 times higher than culture‐based estimates, and the culture‐based technique failed to detect bacteria at lower densities. When both methods detected bacteria, bacterial densities determined by the two methods were positively correlated, indicating that both methods can be used to study factors affecting bacterial densities. The difference between the two methods is consistent with generally acknowledged higher sensitivity of the PCR method, but the extent of the difference in our study (10×) may have been influenced by both a PCR‐based overestimation and culture‐based underestimation of bacterial densities. Our results also illustrate that bacterial counts may sometimes produce left‐censored data (i.e., we did not detect E. coli in 62% of our samples using the culture‐based method). Specific statistical methods have been developed for analyzed left‐censored data, but, to our knowledge, have not been used by ornithologists. In future studies, investigators studying bacterial loads should provide information about the possible degree of left censoring and should justify their choice of statistical methods from the broad set of available methods, including those explicitly designed for censored data.  相似文献   

14.
Sufficient dimension reduction (SDR) that effectively reduces the predictor dimension in regression has been popular in high‐dimensional data analysis. Under the presence of censoring, however, most existing SDR methods suffer. In this article, we propose a new algorithm to perform SDR with censored responses based on the quantile‐slicing scheme recently proposed by Kim et al. First, we estimate the conditional quantile function of the true survival time via the censored kernel quantile regression (Shin et al.) and then slice the data based on the estimated censored regression quantiles instead of the responses. Both simulated and real data analysis demonstrate promising performance of the proposed method.  相似文献   

15.
Kim YJ 《Biometrics》2006,62(2):458-464
In doubly censored failure time data, the survival time of interest is defined as the elapsed time between an initial event and a subsequent event, and the occurrences of both events cannot be observed exactly. Instead, only right- or interval-censored observations on the occurrence times are available. For the analysis of such data, a number of methods have been proposed under the assumption that the survival time of interest is independent of the occurrence time of the initial event. This article investigates a different situation where the independence may not be true with the focus on regression analysis of doubly censored data. Cox frailty models are applied to describe the effects of covariates and an EM algorithm is developed for estimation. Simulation studies are performed to investigate finite sample properties of the proposed method and an illustrative example from an acquired immune deficiency syndrome (AIDS) cohort study is provided.  相似文献   

16.
Large observational databases derived from disease registries and retrospective cohort studies have proven very useful for the study of health services utilization. However, the use of large databases may introduce computational difficulties, particularly when the event of interest is recurrent. In such settings, grouping the recurrent event data into prespecified intervals leads to a flexible event rate model and a data reduction that remedies the computational issues. We propose a possibly stratified marginal proportional rates model with a piecewise-constant baseline event rate for recurrent event data. Both the absence and the presence of a terminal event are considered. Large-sample distributions are derived for the proposed estimators. Simulation studies are conducted under various data configurations, including settings in which the model is misspecified. Guidelines for interval selection are provided and assessed using numerical studies. We then show that the proposed procedures can be carried out using standard statistical software (e.g., SAS, R). An application based on national hospitalization data for end-stage renal disease patients is provided.  相似文献   

17.
Dunson DB  Dinse GE 《Biometrics》2002,58(1):79-88
Multivariate current status data, consist of indicators of whether each of several events occur by the time of a single examination. Our interest focuses on inferences about the joint distribution of the event times. Conventional methods for analysis of multiple event-time data cannot be used because all of the event times are censored and censoring may be informative. Within a given subject, we account for correlated event times through a subject-specific latent variable, conditional upon which the various events are assumed to occur independently. We also assume that each event contributes independently to the hazard of censoring. Nonparametric step functions are used to characterize the baseline distributions of the different event times and of the examination times. Covariate and subject-specific effects are incorporated through generalized linear models. A Markov chain Monte Carlo algorithm is described for estimation of the posterior distributions of the unknowns. The methods are illustrated through application to multiple tumor site data from an animal carcinogenicity study.  相似文献   

18.
In cluster randomized trials (CRTs), identifiable clusters rather than individuals are randomized to study groups. Resulting data often consist of a small number of clusters with correlated observations within a treatment group. Missing data often present a problem in the analysis of such trials, and multiple imputation (MI) has been used to create complete data sets, enabling subsequent analysis with well-established analysis methods for CRTs. We discuss strategies for accounting for clustering when multiply imputing a missing continuous outcome, focusing on estimation of the variance of group means as used in an adjusted t-test or ANOVA. These analysis procedures are congenial to (can be derived from) a mixed effects imputation model; however, this imputation procedure is not yet available in commercial statistical software. An alternative approach that is readily available and has been used in recent studies is to include fixed effects for cluster, but the impact of using this convenient method has not been studied. We show that under this imputation model the MI variance estimator is positively biased and that smaller intraclass correlations (ICCs) lead to larger overestimation of the MI variance. Analytical expressions for the bias of the variance estimator are derived in the case of data missing completely at random, and cases in which data are missing at random are illustrated through simulation. Finally, various imputation methods are applied to data from the Detroit Middle School Asthma Project, a recent school-based CRT, and differences in inference are compared.  相似文献   

19.
This paper discusses two‐sample comparison in the case of interval‐censored failure time data. For the problem, one common approach is to employ some nonparametric test procedures, which usually give some p‐values but not a direct or exact quantitative measure of the survival or treatment difference of interest. In particular, these procedures cannot provide a hazard ratio estimate, which is commonly used to measure the difference between the two treatments or samples. For interval‐censored data, a few nonparametric test procedures have been developed, but it does not seem to exist as a procedure for hazard ratio estimation. Corresponding to this, we present two procedures for nonparametric estimation of the hazard ratio of the two samples for interval‐censored data situations. They are generalizations of the corresponding procedures for right‐censored failure time data. An extensive simulation study is conducted to evaluate the performance of the two procedures and indicates that they work reasonably well in practice. For illustration, they are applied to a set of interval‐censored data arising from a breast cancer study.  相似文献   

20.

Background

In modern biomedical research of complex diseases, a large number of demographic and clinical variables, herein called phenomic data, are often collected and missing values (MVs) are inevitable in the data collection process. Since many downstream statistical and bioinformatics methods require complete data matrix, imputation is a common and practical solution. In high-throughput experiments such as microarray experiments, continuous intensities are measured and many mature missing value imputation methods have been developed and widely applied. Numerous methods for missing data imputation of microarray data have been developed. Large phenomic data, however, contain continuous, nominal, binary and ordinal data types, which void application of most methods. Though several methods have been developed in the past few years, not a single complete guideline is proposed with respect to phenomic missing data imputation.

Results

In this paper, we investigated existing imputation methods for phenomic data, proposed a self-training selection (STS) scheme to select the best imputation method and provide a practical guideline for general applications. We introduced a novel concept of “imputability measure” (IM) to identify missing values that are fundamentally inadequate to impute. In addition, we also developed four variations of K-nearest-neighbor (KNN) methods and compared with two existing methods, multivariate imputation by chained equations (MICE) and missForest. The four variations are imputation by variables (KNN-V), by subjects (KNN-S), their weighted hybrid (KNN-H) and an adaptively weighted hybrid (KNN-A). We performed simulations and applied different imputation methods and the STS scheme to three lung disease phenomic datasets to evaluate the methods. An R package “phenomeImpute” is made publicly available.

Conclusions

Simulations and applications to real datasets showed that MICE often did not perform well; KNN-A, KNN-H and random forest were among the top performers although no method universally performed the best. Imputation of missing values with low imputability measures increased imputation errors greatly and could potentially deteriorate downstream analyses. The STS scheme was accurate in selecting the optimal method by evaluating methods in a second layer of missingness simulation. All source files for the simulation and the real data analyses are available on the author’s publication website.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0346-6) contains supplementary material, which is available to authorized users.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号