Explaining Cancer Detection on Microarray Data: a Machine Learning Approach
Figueiredo, M. A. T.
Explaining Cancer Detection on Microarray Data: a Machine Learning Approach, Proc Inforum - Simpósio de Informática, Guarda, Portugal, Vol. , pp. - , September, 2022.
Digital Object Identifier:
Cancer is one of the main causes of death in many countries. Much research has been devoted to diagnosing different types of cancer from gene expression data, by analyzing the expression (or not) of certain genes. Thus, we can achieve some
explainability on cancer detection, by locating the presence or absence of the relevant features (gene expression) related
with the specific disease. However, DNA microarray data poses many challenges and difficulties in their analysis by human clinical staff, since it is composed by high-dimensional feature vectors with relevant, irrelevant, and redundant features. Additionally, more than just classifying the data, it is also important to identify the most relevant genes for the classification task, allowing for human interpretability of the classification results. In this paper, we propose a machine learning-based approach to identify
the sets of relevant genes for cancer detection on a microarray dataset, while discarding the irrelevant and redundant ones. We resort to a feature selection procedure, a classifier, and the leave-one-out evaluation technique to identify and examine the most relevant features. The number of times a feature is chosen on all the folds indicates its relevance and importance for cancer detection. We find
different number of relevant features for each microarray dataset, considered in the experimental evaluation.