When modeling competing risks (CR) survival data, several techniques have been proposed in both the statistical and machine learning literature. State-of-the-art methods have extended classical approaches with more flexible assumptions that can improve predictive performance, allow high-dimensional data and missing values, among others. Despite this, modern approaches have not been widely employed in applied settings. This article aims to aid the uptake of such methods by providing a condensed compendium of CR survival methods with a unified notation and interpretation across approaches. We highlight available software and, when possible, demonstrate their usage via reproducible R vignettes. Moreover, we discuss two major concerns that can affect benchmark studies in this context: the choice of performance metrics and reproducibility. 相似文献
Screening mammography aims to identify breast cancer early and secondarily measures breast density to classify women at higher or lower than average risk for future breast cancer in the general population. Despite the strong association of individual mammography features to breast cancer risk, the statistical literature on mammogram imaging data is limited. While functional principal component analysis (FPCA) has been studied in the literature for extracting image-based features, it is conducted independently of the time-to-event response variable. With the consideration of building a prognostic model for precision prevention, we present a set of flexible methods, supervised FPCA (sFPCA) and functional partial least squares (FPLS), to extract image-based features associated with the failure time while accommodating the added complication from right censoring. Throughout the article, we hope to demonstrate that one method is favored over the other under different clinical setups. The proposed methods are applied to the motivating data set from the Joanne Knight Breast Health cohort at Siteman Cancer Center. Our approaches not only obtain the best prediction performance compared to the benchmark model, but also reveal different risk patterns within the mammograms. 相似文献
In order to assess prognostic risk for individuals in precision health research, risk prediction models are increasingly used, in which statistical models are used to estimate the risk of future outcomes based on clinical and nonclinical characteristics. The predictive accuracy of a risk score must be assessed before it can be used in routine clinical decision making, where the receiver operator characteristic curves, precision–recall curves, and their corresponding area under the curves are commonly used metrics to evaluate the discriminatory ability of a continuous risk score. Among these the precision–recall curves have been shown to be more informative when dealing with unbalanced biomarker distribution between classes, which is common in rare event, even though except one, all existing methods are proposed for classic uncensored data. This paper is therefore to propose a novel nonparametric estimation approach for the time-dependent precision–recall curve and its associated area under the curve for right-censored data. A simulation is conducted to show the better finite sample property of the proposed estimator over the existing method and a real-world data from primary biliary cirrhosis trial is used to demonstrate the practical applicability of the proposed estimator. 相似文献
A competing risk approach was used to evaluate the influence of several factors on culling risk for 587 Duroc sows. Three different analyses were performed according to whether sow failure was due to death during productive life (DE) or to one of two causes for voluntary culling: low productivity (LP) and low fertility (LF). Sow survival was analyzed by the Cox model. Year at first farrowing (batch effect) significantly affected sow survival in all three analyses (P < 0.05 for DE and P < 0.001 for LP and LF) whereas farm of origin accounted for relevant variation in the LP and LF analyses. LP culling increased with backfat thickness of more than 19 mm at the end of the growth period (P < 0.05), bad teat condition (P < 0.05) and reduced piglets born alive (P < 0.001). For the LF competing risk analysis, culling increased with age at first farrowing (P < 0.1). Special emphasis was placed on the influence of leg and teat conformation on sow survivability, although they did not affect sow failure due to DE (P > 0.1). The overall leg-conformation score significantly influenced sow longevity in LP (P < 0.001) and LF competing risk analyses (P < 0.001), showing a higher hazard ratio (HR) for poorly conformed sows (1.013 and 4.366, respectively) than for well-conformed sows (0.342 and 0.246, respectively). Survival decreased with the presence of abnormal hoof growth in LP and LF analyses (HR = 3.372 and 6.002, respectively; P < 0.001) and bumps or injuries to legs (HR = 4.172 and 5.839, respectively; P < 0.01). Plantigradism reduced sow survival in the LP analysis (P < 0.05), while sickle-hooked leg (P < 0.05) impaired sow survival in the fertility-specific analysis. Estimates of heritability for longevity related to LP culling ranged from 0.008 to 0.024 depending on the estimation procedure, whereas heritability values increased to between 0.017 and 0.083 in LF analysis. These analyses highlighted substantial discrepancies in the sources of variation and genetic background of sow longevity depending on the cause of failure. The estimated heritabilities suggested that direct genetic improvement for sow longevity seemed feasible, although only a small genetic progress was expected. 相似文献
We propose to combine the benefits of flexible parametric survival modeling and regularization to improve risk prediction modeling in the context of time-to-event data. Thereto, we introduce ridge, lasso, elastic net, and group lasso penalties for both log hazard and log cumulative hazard models. The log (cumulative) hazard in these models is represented by a flexible function of time that may depend on the covariates (i.e., covariate effects may be time-varying). We show that the optimization problem for the proposed models can be formulated as a convex optimization problem and provide a user-friendly R implementation for model fitting and penalty parameter selection based on cross-validation. Simulation study results show the advantage of regularization in terms of increased out-of-sample prediction accuracy and improved calibration and discrimination of predicted survival probabilities, especially when sample size was relatively small with respect to model complexity. An applied example illustrates the proposed methods. In summary, our work provides both a foundation for and an easily accessible implementation of regularized parametric survival modeling and suggests that it improves out-of-sample prediction performance. 相似文献
Osteosarcoma (OS) is the most common primary solid malignant bone tumor, and its metastasis is a prominent cause of high mortality in patients. In this study, a prognosis risk signature was constructed based on metastasis-associated genes. Four microarrays datasets with clinical information were downloaded from Gene Expression Omnibus, and 256 metastasis-associated genes were identified by limma package. Further, a protein-protein interaction network was constructed, and survival analysis was performed using data from the Therapeutically Applicable Research to Generate Effective Treatments data matrix, identifying 19 genes correlated with prognosis. Six genes were selected by the least absolute shrinkage and selection operator regression for multivariate cox analysis. Finally, a three-gene (MYC, CPE, and LY86) risk signature was constructed, and datasets GSE21257 and GSE16091 were used to validate the prediction efficiency of the signature. The survival times of low- and high-risk groups were significantly different in the training set and validation set. Additionally, gene set enrichment analysis revealed that the genes in the signature may affect the cell cycle, gap junctions, and interleukin-6 production. Therefore, the three-gene survival risk signature could potentially predict the prognosis of patients with OS. Further, proteins encoded by CPE and LY86 may provide novel insights into the prediction of OS prognosis and therapeutic targets. 相似文献
To help provide evidence for prognosis prediction and personalized targeted therapy for patients with head and neck squamous cell carcinoma (HNSCC), we investigated prognosis-specific methylation-driven genes in HNSCC. Survival time data, RNA sequencing data, and methylation data for HNSCC patients were downloaded from The Cancer Genome Atlas. The MethylMix R package based on the β mixture model was utilized to screen genes with different methylation statuses in tumor tissues and adjacent normal tissues, and a total of 182 HNSCC-related methylation-driven genes were then identified. A survival prediction scoring model based on multivariate Cox analysis was developed to screen the genes related to the prognosis of HNSCC, and a linear risk model of the methylation status of six genes (INA, LINC01354, TSPYL4, MAGEB2, EPHX3, and ZNF134) was constructed. The prognostic values of the six genes were further independently explored by survival analysis combined with methylation and gene expression analyses. The 5-year survival rate in the high-risk group of patients in the test set was 30.4% (95% CI: 22.7%-40.8%) and that in the low-risk group of patients was 65.5% (95% CI: 56.1%-76.5%). The area under the receiver operating characteristic curve for the model was 0.723, which further verified the specificity and sensitivity of the model. In addition, subsequent combined survival analysis revealed that all six genes could be used as independent prognostic markers and thus might be potential drug targets. The innovative method provides new insight into the molecular mechanism and prognosis of HNSCC. 相似文献
Objective: The objective of this study is to evaluate the relevance of Lp-PLA2 to risk prediction among coronary heart disease (CHD) patients.
Methods: Lp-PLA2 activity was measured in 2538 CHD patients included in the Bezafibrate Infarction Prevention (BIP) study.
Results: Adjusting for patient characteristics and traditional risk factors, 1 standard deviation of Lp-PLA2 was associated with a hazard ratio (HR) of 1.12 (95% confidence interval (CI): 1.00–1.25) for mortality and 1.03 (0.93–1.14) for cardiovascular events. Lp-PLA2 did not significantly improve model discrimination, or calibration nor result in noteworthy reclassification.
Conclusions: Our results do not support added value of Lp-PLA2 for predicting cardiovascular events or mortality among CHD patients beyond traditional risk factor. 相似文献
We argue that the term “relative risk” should not be used as a synonym for “hazard ratio” and encourage to use the probabilistic index as an alternative effect measure for Cox regression. The probabilistic index is the probability that the event time of an exposed or treated subject exceeds the event time of an unexposed or untreated subject conditional on the other covariates. It arises as a well known and simple transformation of the hazard ratio and nicely reveals the interpretational limitations. We demonstrate how the probabilistic index can be obtained using the R-package Publish. 相似文献
We are interested in the estimation of average treatment effects based on right-censored data of an observational study. We focus on causal inference of differences between t-year absolute event risks in a situation with competing risks. We derive doubly robust estimation equations and implement estimators for the nuisance parameters based on working regression models for the outcome, censoring, and treatment distribution conditional on auxiliary baseline covariates. We use the functional delta method to show that these estimators are regular asymptotically linear estimators and estimate their variances based on estimates of their influence functions. In empirical studies, we assess the robustness of the estimators and the coverage of confidence intervals. The methods are further illustrated using data from a Danish registry study. 相似文献
In clinical research and practice, landmark models are commonly used to predict the risk of an adverse future event, using patients' longitudinal biomarker data as predictors. However, these data are often observable only at intermittent visits, making their measurement times irregularly spaced and unsynchronized across different subjects. This poses challenges to conducting dynamic prediction at any post-baseline time. A simple solution is the last-value-carry-forward method, but this may result in bias for the risk model estimation and prediction. Another option is to jointly model the longitudinal and survival processes with a shared random effects model. However, when dealing with multiple biomarkers, this approach often results in high-dimensional integrals without a closed-form solution, and thus the computational burden limits its software development and practical use. In this article, we propose to process the longitudinal data by functional principal component analysis techniques, and then use the processed information as predictors in a class of flexible linear transformation models to predict the distribution of residual time-to-event occurrence. The measurement schemes for multiple biomarkers are allowed to be different within subject and across subjects. Dynamic prediction can be performed in a real-time fashion. The advantages of our proposed method are demonstrated by simulation studies. We apply our approach to the African American Study of Kidney Disease and Hypertension, predicting patients' risk of kidney failure or death by using four important longitudinal biomarkers for renal functions. 相似文献
In clinical trials with time‐to‐event outcomes, it is of interest to predict when a prespecified number of events can be reached. Interim analysis is conducted to estimate the underlying survival function. When another correlated time‐to‐event endpoint is available, both outcome variables can be used to improve estimation efficiency. In this paper, we propose to use the convolution of two time‐to‐event variables to estimate the survival function of interest. Propositions and examples are provided based on exponential models that accommodate possible change points. We further propose a new estimation equation about the expected time that exploits the relationship of two endpoints. Simulations and the analysis of real data show that the proposed methods with bivariate information yield significant improvement in prediction over that of the univariate method. 相似文献
An accelerated failure time (AFT) model assuming a log-linear relationship between failure time and a set of covariates can be either parametric or semiparametric, depending on the distributional assumption for the error term. Both classes of AFT models have been popular in the analysis of censored failure time data. The semiparametric AFT model is more flexible and robust to departures from the distributional assumption than its parametric counterpart. However, the semiparametric AFT model is subject to producing biased results for estimating any quantities involving an intercept. Estimating an intercept requires a separate procedure. Moreover, a consistent estimation of the intercept requires stringent conditions. Thus, essential quantities such as mean failure times might not be reliably estimated using semiparametric AFT models, which can be naturally done in the framework of parametric AFT models. Meanwhile, parametric AFT models can be severely impaired by misspecifications. To overcome this, we propose a new type of the AFT model using a nonparametric Gaussian-scale mixture distribution. We also provide feasible algorithms to estimate the parameters and mixing distribution. The finite sample properties of the proposed estimators are investigated via an extensive stimulation study. The proposed estimators are illustrated using a real dataset. 相似文献
Family‐based and genome‐wide association studies (GWAS) of alcohol dependence (AD) have reported numerous associated variants. The clinical validity of these variants for predicting AD compared with family history information has not been reported. Using the Collaborative Study on the Genetics of Alcoholism (COGA) and the Study of Addiction: Genes and Environment (SAGE) GWAS samples, we examined the aggregate impact of multiple single nucleotide polymorphisms (SNPs) on risk prediction. We created genetic sum scores by adding risk alleles associated in discovery samples, and then tested the scores for their ability to discriminate between cases and controls in validation samples. Genetic sum scores were assessed separately for SNPs associated with AD in candidate gene studies and SNPs from GWAS analyses that met varying P‐value thresholds. Candidate gene sum scores did not exhibit significant predictive accuracy. Family history was a better classifier of case‐control status, with a significant area under the receiver operating characteristic curve (AUC) of 0.686 in COGA and 0.614 in SAGE. SNPs that met less stringent P‐value thresholds of 0.01–0.50 in GWAS analyses yielded significant AUC estimates, ranging from mean estimates of 0.549 for SNPs with P < 0.01 to 0.565 for SNPs with P < 0.50. This study suggests that SNPs currently have limited clinical utility, but there is potential for enhanced predictive ability with better understanding of the large number of variants that might contribute to risk. 相似文献