首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Objectives: The present study aimed to develop a random forest (RF) based prediction model for hyperuricemia (HUA) and compare its performance with the conventional logistic regression (LR) model. Methods: This cross-sectional study recruited 91,690 participants (14,032 with HUA, 77,658 without HUA). We constructed a RF-based prediction model in the training sets and evaluated it in the validation sets. Performance of the RF model was compared with the LR model by receiver operating characteristic (ROC) curve analysis. Results: The sensitivity and specificity of the RF models were 0.702 and 0.650 in males, 0.767 and 0.721 in females. The positive predictive value (PPV) and negative predictive value (NPV) were 0.372 and 0.881 in males, 0.159 and 0.978 in females. AUC of the RF models was 0.739 (0.728–0.750) in males and 0.818 (0.799–0.837) in females. AUC of the LR models were 0.730 (0.718–0.741) for males and 0.815 (0.795–0.835) for females. The predictive power of RF was slightly higher than that of LR, but was not statistically significant in females (Delong tests, P=0.0015 for males, P=0.5415 for females). Conclusion: Compared with LR, the good performance in HUA status prediction and the tolerance of features associations or interactions showed great potential of RF in further application. A prospective cohort is necessary for HUA developing prediction. People with high risk factors should be encouraged to actively control to reduce the probability of developing HUA.  相似文献   

2.

Background

This study aimed to develop the artificial neural network (ANN) and multivariable logistic regression (LR) analyses for prediction modeling of cardiovascular autonomic (CA) dysfunction in the general population, and compare the prediction models using the two approaches.

Methods and Materials

We analyzed a previous dataset based on a Chinese population sample consisting of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN and LR analysis, and were tested in the validation set. Performances of these prediction models were then compared.

Results

Univariate analysis indicated that 14 risk factors showed statistically significant association with the prevalence of CA dysfunction (P<0.05). The mean area under the receiver-operating curve was 0.758 (95% CI 0.724–0.793) for LR and 0.762 (95% CI 0.732–0.793) for ANN analysis, but noninferiority result was found (P<0.001). The similar results were found in comparisons of sensitivity, specificity, and predictive values in the prediction models between the LR and ANN analyses.

Conclusion

The prediction models for CA dysfunction were developed using ANN and LR. ANN and LR are two effective tools for developing prediction models based on our dataset.  相似文献   

3.

Background

Chronic kidney disease (CKD) is common, and associated with increased risk of cardiovascular disease and end-stage renal disease, which are potentially preventable through early identification and treatment of individuals at risk. Although risk factors for occurrence and progression of CKD have been identified, their utility for CKD risk stratification through prediction models remains unclear. We critically assessed risk models to predict CKD and its progression, and evaluated their suitability for clinical use.

Methods and Findings

We systematically searched MEDLINE and Embase (1 January 1980 to 20 June 2012). Dual review was conducted to identify studies that reported on the development, validation, or impact assessment of a model constructed to predict the occurrence/presence of CKD or progression to advanced stages. Data were extracted on study characteristics, risk predictors, discrimination, calibration, and reclassification performance of models, as well as validation and impact analyses. We included 26 publications reporting on 30 CKD occurrence prediction risk scores and 17 CKD progression prediction risk scores. The vast majority of CKD risk models had acceptable-to-good discriminatory performance (area under the receiver operating characteristic curve>0.70) in the derivation sample. Calibration was less commonly assessed, but overall was found to be acceptable. Only eight CKD occurrence and five CKD progression risk models have been externally validated, displaying modest-to-acceptable discrimination. Whether novel biomarkers of CKD (circulatory or genetic) can improve prediction largely remains unclear, and impact studies of CKD prediction models have not yet been conducted. Limitations of risk models include the lack of ethnic diversity in derivation samples, and the scarcity of validation studies. The review is limited by the lack of an agreed-on system for rating prediction models, and the difficulty of assessing publication bias.

Conclusions

The development and clinical application of renal risk scores is in its infancy; however, the discriminatory performance of existing tools is acceptable. The effect of using these models in practice is still to be explored. Please see later in the article for the Editors'' Summary  相似文献   

4.
For medical decision making and patient information, predictions of future status variables play an important role. Risk prediction models can be derived with many different statistical approaches. To compare them, measures of predictive performance are derived from ROC methodology and from probability forecasting theory. These tools can be applied to assess single markers, multivariable regression models and complex model selection algorithms. This article provides a systematic review of the modern way of assessing risk prediction models. Particular attention is put on proper benchmarks and resampling techniques that are important for the interpretation of measured performance. All methods are illustrated with data from a clinical study in head and neck cancer patients.  相似文献   

5.
Classical paper-and-pencil based risk assessment questionnaires are often accompanied by the online versions of the questionnaire to reach a wider population. This study focuses on the loss, especially in risk estimation performance, that can be inflicted by direct transformation from the paper to online versions of risk estimation calculators by ignoring the possibilities of more complex and accurate calculations that can be performed using the online calculators. We empirically compare the risk estimation performance between four major diabetes risk calculators and two, more advanced, predictive models. National Health and Nutrition Examination Survey (NHANES) data from 1999–2012 was used to evaluate the performance of detecting diabetes and pre-diabetes.American Diabetes Association risk test achieved the best predictive performance in category of classical paper-and-pencil based tests with an Area Under the ROC Curve (AUC) of 0.699 for undiagnosed diabetes (0.662 for pre-diabetes) and 47% (47% for pre-diabetes) persons selected for screening. Our results demonstrate a significant difference in performance with additional benefits for a lower number of persons selected for screening when statistical methods are used. The best AUC overall was obtained in diabetes risk prediction using logistic regression with AUC of 0.775 (0.734) and an average 34% (48%) persons selected for screening. However, generalized boosted regression models might be a better option from the economical point of view as the number of selected persons for screening of 30% (47%) lies significantly lower for diabetes risk assessment in comparison to logistic regression (p < 0.001), with a significantly higher AUC (p < 0.001) of 0.774 (0.740) for the pre-diabetes group.Our results demonstrate a serious lack of predictive performance in four major online diabetes risk calculators. Therefore, one should take great care and consider optimizing the online versions of questionnaires that were primarily developed as classical paper questionnaires.  相似文献   

6.
Critical velocity (CV) represents, theoretically, the highest velocity that can be sustained without fatigue. The aim of this study was to compare CV computed from 5 mathematical models in order to determine which CV estimate is better correlated with 1-hour performance and which model provides the most accurate prediction of performance. Twelve trained middle- and long-distance male runners (29 +/- 5 years) performed 3 randomly ordered constant duration tests (6, 9, and 12 minutes), a maximal running velocity test for the estimation of CV, and a 1-hour track test (actual performance). Two linear, 2 nonlinear, and 1 exponential mathematical models were used to estimate CV and to predict the highest velocity that could be sustained during 1 hour (predicted performance). Although all CV estimates were correlated with performance (0.80 < r < 0.93, p < 0.01), it appeared that CV estimated from the exponential model was more closely associated with performance than all other models (r = 0.93; p < 0.01). Analysis of the bias +/- 95% interval of confidence between actual and predicted performance revealed that none of the models provided an accurate prediction of the 1-hour performance velocity. In conclusion, the estimation of CV allows us to rank middle- and long-distance runners with regard to their ability to perform well in long-distance running. However, no models provide an accurate prediction of performance that could be used as a reference for coaches or athletes.  相似文献   

7.

Background

Cardiovascular disease and its risk factors have consistently been associated with poor cognitive function and incident dementia. Whether cardiovascular disease prediction models, developed to predict an individual''s risk of future cardiovascular disease or stroke, are also informative for predicting risk of cognitive decline and dementia is not known.

Objective

The objective of this systematic review was to compare cohort studies examining the association between cardiovascular disease risk models and longitudinal changes in cognitive function or risk of incident cognitive impairment or dementia.

Materials and Methods

Medline, PsychINFO, and Embase were searched from inception to March 28, 2014. From 3,413 records initially screened, 21 were included.

Results

The association between numerous different cardiovascular disease risk models and cognitive outcomes has been tested, including Framingham and non-Framingham risk models. Five studies examined dementia as an outcome; fourteen studies examined cognitive decline or incident cognitive impairment as an outcome; and two studies examined both dementia and cognitive changes as outcomes. In all studies, higher cardiovascular disease risk scores were associated with cognitive changes or risk of dementia. Only four studies reported model prognostic performance indices, such as Area Under the Curve (AUC), for predicting incident dementia or cognitive impairment and these studies all examined non-Framingham Risk models (AUC range: 0.74 to 0.78).

Conclusions

Cardiovascular risk prediction models are associated with cognitive changes over time and risk of dementia. Such models are easily obtainable in clinical and research settings and may be useful for identifying individuals at high risk of future cognitive decline and dementia.  相似文献   

8.
A computational system for the prediction and classification of human G-protein coupled receptors (GPCRs) has been developed based on the support vector machine (SVM) method and protein sequence information. The feature vectors used to develop the SVM prediction models consist of statistically significant features selected from single amino acid, dipeptide, and tripeptide compositions of protein sequences. Furthermore, the length distribution difference between GPCRs and non-GPCRs has also been exploited to improve the prediction performance. The testing results with annotated human protein sequences demonstrate that this system can get good performance for both prediction and classification of human GPCRs.  相似文献   

9.
Various attempts have been made to predict the individual disease risk based on genotype data from genome-wide association studies (GWAS). However, most studies only investigated one or two classification algorithms and feature encoding schemes. In this study, we applied seven different classification algorithms on GWAS case-control data sets for seven different diseases to create models for disease risk prediction. Further, we used three different encoding schemes for the genotypes of single nucleotide polymorphisms (SNPs) and investigated their influence on the predictive performance of these models. Our study suggests that an additive encoding of the SNP data should be the preferred encoding scheme, as it proved to yield the best predictive performances for all algorithms and data sets. Furthermore, our results showed that the differences between most state-of-the-art classification algorithms are not statistically significant. Consequently, we recommend to prefer algorithms with simple models like the linear support vector machine (SVM) as they allow for better subsequent interpretation without significant loss of accuracy.  相似文献   

10.
11.
Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.  相似文献   

12.
Marker-based prediction of hybrid performance facilitates the identification of untested single-cross hybrids with superior yield performance. Our objectives were to (1) determine the haplotype block structure of experimental germplasm from a hybrid maize breeding program, (2) develop models for hybrid performance prediction based on haplotype blocks, and (3) compare hybrid performance prediction based on haplotype blocks with other approaches, based on single AFLP markers or general combining ability (GCA), under a validation scenario relevant for practical breeding. In total, 270 hybrids were evaluated for grain yield in four Dent × Flint factorial mating experiments. Their parental inbred lines were genotyped with 20 AFLP primer–enzyme combinations. Adjacent marker loci were combined into haplotype blocks. Hybrid performance was predicted on basis of single marker loci and haplotype blocks. Prediction based on variable haplotype block length resulted in an improved prediction of hybrid performance compared with the use of single AFLP markers. Estimates of prediction efficiency (R 2 ) ranged from 0.305 to 0.889 for marker-based prediction and from 0.465 to 0.898 for GCA-based prediction. For inter-group hybrids with predominance of general over specific combining ability, the hybrid prediction from GCA effects was efficient in identifying promising hybrids. Considering the advantage of haplotype block approaches over single marker approaches for the prediction of inter-group hybrids, we see a high potential to substantially improve the efficiency of hybrid breeding programs. Tobias A. Schrag and Hans Peter Maurer contributed equally to this work.  相似文献   

13.
Clinical prediction models play a key role in risk stratification, therapy assignment and many other fields of medical decision making. Before they can enter clinical practice, their usefulness has to be demonstrated using systematic validation. Methods to assess their predictive performance have been proposed for continuous, binary, and time-to-event outcomes, but the literature on validation methods for discrete time-to-event models with competing risks is sparse. The present paper tries to fill this gap and proposes new methodology to quantify discrimination, calibration, and prediction error (PE) for discrete time-to-event outcomes in the presence of competing risks. In our case study, the goal was to predict the risk of ventilator-associated pneumonia (VAP) attributed to Pseudomonas aeruginosa in intensive care units (ICUs). Competing events are extubation, death, and VAP due to other bacteria. The aim of this application is to validate complex prediction models developed in previous work on more recently available validation data.  相似文献   

14.
Perception relies on the response of populations of neurons in sensory cortex. How the response profile of a neuronal population gives rise to perception and perceptual discrimination has been conceptualized in various ways. Here we suggest that neuronal population responses represent information about our environment explicitly as Fisher information (FI), which is a local measure of the variance estimate of the sensory input. We show how this sensory information can be read out and combined to infer from the available information profile which stimulus value is perceived during a fine discrimination task. In particular, we propose that the perceived stimulus corresponds to the stimulus value that leads to the same information for each of the alternative directions, and compare the model prediction to standard models considered in the literature (population vector, maximum likelihood, maximum-a-posteriori Bayesian inference). The models are applied to human performance in a motion discrimination task that induces perceptual misjudgements of a target direction of motion by task irrelevant motion in the spatial surround of the target stimulus (motion repulsion). By using the neurophysiological insight that surround motion suppresses neuronal responses to the target motion in the center, all models predicted the pattern of perceptual misjudgements. The variation of discrimination thresholds (error on the perceived value) was also explained through the changes of the total FI content with varying surround motion directions. The proposed FI decoding scheme incorporates recent neurophysiological evidence from macaque visual cortex showing that perceptual decisions do not rely on the most active neurons, but rather on the most informative neuronal responses. We statistically compare the prediction capability of the FI decoding approach and the standard decoding models. Notably, all models reproduced the variation of the perceived stimulus values for different surrounds, but with different neuronal tuning characteristics underlying perception. Compared to the FI approach the prediction power of the standard models was based on neurons with far wider tuning width and stronger surround suppression. Our study demonstrates that perceptual misjudgements can be based on neuronal populations encoding explicitly the available sensory information, and provides testable neurophysiological predictions on neuronal tuning characteristics underlying human perceptual decisions.  相似文献   

15.
16.
Cardiovascular disease (CVD) risk in India is currently assessed using the World Health Organization/International Society for Hypertension (WHO/ISH) risk prediction charts since no population-specific models exist. The WHO/ISH risk prediction charts have two versions—one with total cholesterol as a predictor (the high information (HI) model) and the other without (the low information (LI) model). However, information on the WHO/ISH risk prediction charts including guidance on which version to use and when, as well as relative performance of the LI and HI models, is limited. This article aims to, firstly, quantify the relative performance of the LI and HI WHO/ISH risk prediction (for WHO-South East Asian Region D) using data from rural India. Secondly, we propose a pre-screening (simplified) point-of-care (POC) test to identify patients who are likely to benefit from a total cholesterol (TC) test, and subsequently when the LI model is preferential to HI model. Analysis was performed using cross-sectional data from rural Andhra Pradesh collected in 2005 with recorded blood cholesterol measurements (N = 1066). CVD risk was computed using both LI and HI models, and high risk individuals who needed treatment(T HR) were subsequently identified based on clinical guidelines. Model development for the POC assessment of a TC test was performed through three machine learning techniques: Support Vector Machine (SVM), Regularised Logistic Regression (RLR), and Random Forests (RF) along with a feature selection process. Disagreement in CVD risk predicted by LI and HI WHO/ISH models was 14.5% (n = 155; p<0.01) overall and comprised 36 clinically relevant T HR patients (31% of patients identified as T HR by using either model). Using two patient-specific parameters (age, systolic blood pressure), our POC assessment can pre-determine the benefit of TC testing and choose the appropriate risk model (out-of-sample AUCs:RF-0.85,SVM-0.84,RLR:0.82 and maximum sensitivity-98%). The identification of patients benefitting from a TC test for CVD risk stratification can aid planning for resource-allocation and save costs for large-scale screening programmes.  相似文献   

17.
In many medical applications, interpretable models with high prediction performance are sought. Often, those models are required to handle semistructured data like tabular and image data. We show how to apply deep transformation models (DTMs) for distributional regression that fulfill these requirements. DTMs allow the data analyst to specify (deep) neural networks for different input modalities making them applicable to various research questions. Like statistical models, DTMs can provide interpretable effect estimates while achieving the state-of-the-art prediction performance of deep neural networks. In addition, the construction of ensembles of DTMs that retain model structure and interpretability allows quantifying epistemic and aleatoric uncertainty. In this study, we compare several DTMs, including baseline-adjusted models, trained on a semistructured data set of 407 stroke patients with the aim to predict ordinal functional outcome three months after stroke. We follow statistical principles of model-building to achieve an adequate trade-off between interpretability and flexibility while assessing the relative importance of the involved data modalities. We evaluate the models for an ordinal and dichotomized version of the outcome as used in clinical practice. We show that both tabular clinical and brain imaging data are useful for functional outcome prediction, whereas models based on tabular data only outperform those based on imaging data only. There is no substantial evidence for improved prediction when combining both data modalities. Overall, we highlight that DTMs provide a powerful, interpretable approach to analyzing semistructured data and that they have the potential to support clinical decision-making.  相似文献   

18.
Motivated by a clinical prediction problem, a simulation study was performed to compare different approaches for building risk prediction models. Robust prediction models for hospital survival in patients with acute heart failure were to be derived from three highly correlated blood parameters measured up to four times, with predictive ability having explicit priority over interpretability. Methods that relied only on the original predictors were compared with methods using an expanded predictor space including transformations and interactions. Predictors were simulated as transformations and combinations of multivariate normal variables which were fitted to the partly skewed and bimodally distributed original data in such a way that the simulated data mimicked the original covariate structure. Different penalized versions of logistic regression as well as random forests and generalized additive models were investigated using classical logistic regression as a benchmark. Their performance was assessed based on measures of predictive accuracy, model discrimination, and model calibration. Three different scenarios using different subsets of the original data with different numbers of observations and events per variable were investigated. In the investigated setting, where a risk prediction model should be based on a small set of highly correlated and interconnected predictors, Elastic Net and also Ridge logistic regression showed good performance compared to their competitors, while other methods did not lead to substantial improvements or even performed worse than standard logistic regression. Our work demonstrates how simulation studies that mimic relevant features of a specific data set can support the choice of a good modeling strategy.  相似文献   

19.
We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML) from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.  相似文献   

20.
The higher heating value (HHV) is an important property defining the energy content of biomass fuels. A number of proximate and/or ultimate analysis based predominantly linear correlations have been proposed for predicting the HHV of biomass fuels. A scrutiny of the relationships between the constituents of the proximate and ultimate analyses and the corresponding HHVs suggests that all relationships are not linear and thus nonlinear models may be more appropriate. Accordingly, a novel artificial intelligence (AI) formalism, namely genetic programming (GP) has been employed for the first time for developing two biomass HHV prediction models, respectively using the constituents of the proximate and ultimate analyses as the model inputs. The prediction and generalization performance of these models was compared rigorously with the corresponding multilayer perceptron (MLP) neural network based as also currently available high-performing linear and nonlinear HHV models. This comparison reveals that the HHV prediction performance of the GP and MLP models is consistently better than that of their existing linear and/or nonlinear counterparts. Specifically, the GP- and MLP-based models exhibit an excellent overall prediction accuracy and generalization performance with high (>0.95) magnitudes of the coefficient of correlation and low (<4.5 %) magnitudes of mean absolute percentage error in respect of the experimental and model-predicted HHVs. It is also found that the proximate analysis-based GP model has outperformed all the existing high-performing linear biomass HHV prediction models. In the case of ultimate analysis-based HHV models, the MLP model has exhibited best prediction accuracy and generalization performance when compared with the existing linear and nonlinear models. The AI-based models introduced in this paper due to their excellent performance have the potential to replace the existing biomass HHV prediction models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号