Leveraging Explainability with K-Fold Feature Selection
Figueiredo, M. A. T.
Leveraging Explainability with K-Fold Feature Selection, Proc INSTICC International Conf. on Pattern Recognition Applications and Methods - ICPRAM, Lisboa, Portugal, Vol. , pp. 458 - 465, February, 2023.
Digital Object Identifier: 10.5220/0011744400003411
Learning with high-dimensional (HD) data poses many challenges, since the large number of features often yields redundancy and irrelevance issues, which may decrease the performance of machine learning (ML) methods. Often, when learning with HD data, one resorts to feature selection (FS) approaches to avoid the curse of dimensionality. The use of FS may improve the results, but its use by itself does not lead to explainability, in the sense of identifying the small subset of core features that most influence the prediction of the
ML model, which can still be seen as a black-box. In this paper, we propose k-fold feature selection (KFFS),
which is a FS approach to shed some light into that black-box, by resorting to the k-fold data partition procedure and one generic unsupervised or supervised FS filter. KFFS finds small and decisive subsets of features for a classification task, at the expense of increased computation time. On HD data, KFFS finds small subsets of features, with dimensionality small enough to be analyzed by human experts (e.g, a medical doctor in a
cancer detection problem). It also provides classification models with lower error rate and fewer features than those provided by the use of the individual supervised FS filter.