首页 | 本学科首页   官方微博 | 高级检索  
     


A framework for feature extraction from hospital medical data with applications in risk prediction
Authors:Truyen Tran  Wei Luo  Dinh Phung  Sunil Gupta  Santu Rana  Richard Lee Kennedy  Ann Larkins  Svetha Venkatesh
Affiliation:.Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong, VIC 3220 Australia ;.Department of Computing, Curtin University, Perth, WA Australia ;.School of Medicine, Deakin University, Geelong, VIC Australia ;.Barwon Health, Geelong, VIC Australia
Abstract:

Background

Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities.

Results

Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods.For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD—baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes—baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders—baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia—baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72).

Conclusions

The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0425-8) contains supplementary material, which is available to authorized users.
Keywords:Feature extraction   Risk prediction   Hospital data
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号