Stroke Treatment Prediction Using Features Selection Methods and Machine Learning Classifiers |
| |
Affiliation: | 1. STICODE Departement, RIADI laboratory of National School of Computer Sciences, Manouba, Tunisia;2. MATHSTIC Departement, ITI laboratory of IMT Atlantique, Brest, France;3. Intradys, Brest, France |
| |
Abstract: | ObjectivesFeature selection in data sets is an important task allowing to alleviate various machine learning and data mining issues. The main objectives of a feature selection method consist on building simpler and more understandable classifier models in order to improve the data mining and processing performances. Therefore, a comparative evaluation of the Chi-square method, recursive feature elimination method, and tree-based method (using Random Forest) used on the three common machine learning methods (K-Nearest Neighbor, naïve Bayesian classifier and decision tree classifier) are performed to select the most relevant primitives from a large set of attributes. Furthermore, determining the most suitable couple (i.e., feature selection method-machine learning method) that provides the best performance is performed.Materials and methodsIn this paper, an overview of the most common feature selection techniques is first provided: the Chi-Square method, the Recursive Feature Elimination method (RFE) and the tree-based method (using Random Forest). A comparative evaluation of the improvement (brought by such feature selection methods) to the three common machine learning methods (K- Nearest Neighbor, naïve Bayesian classifier and decision tree classifier) are performed. For evaluation purposes, the following measures: micro-F1, accuracy and root mean square error are used on the stroke disease data set.ResultsThe obtained results show that the proposed approach (i.e., Tree Based Method using Random Forest, TBM-RF, decision tree classifier, DTC) provides accuracy higher than 85%, F1-score higher than 88%, thus, better than the KNN and NB using the Chi-Square, RFE and TBM-RF methods.ConclusionThis study shows that the couple - Tree Based Method using Random Forest (TBM-RF) decision tree classifier successfully and efficiently contributes to find the most relevant features and to predict and classify patient suffering of stroke disease.” |
| |
Keywords: | Stroke disease Feature selection Data mining Decision tree classifier Naive Bayes K-nearest neighbor Recursive feature elimination Tree-based model Chi-square |
本文献已被 ScienceDirect 等数据库收录! |
|