rel="stylesheet">
Creating and sharing knowledge in communications and information technology

Modeling and Predicting Machine Learning Performance

Corona, J. ; Teixeira, R. ; Antunes, M. ; Aguiar, R.

Modeling and Predicting Machine Learning Performance, Proc IEEE International Conference on Tools with Artificial Intelligence ICTAI, Athens, Greece, Vol. , pp. - , November, 2025.

Digital Object Identifier:

Download Full text PDF ( 418 KBs)

 

Abstract
Modern Machine Learning (ML) models pose challenges such as long development cycles, high costs, and difficult model selection. Understanding how dataset properties influence performance in a model-agnostic and interpretable way remains limited. This work analyzes the relationship between dataset characteristics and model behavior across diverse algorithms, introducing symbolic regression models that transparently estimate accuracy from dataset features and simple model identifiers. Results show that features such as noisiness, redundancy, and class distribution consistently affect performance. A symbolic model using only dataset features achieved modest accuracy (R2 = 0.361), while including model identification improved predictions (R2 = 0.534). These findings provide interpretable insights into the data–model relationship, supporting preprocessing decisions (e.g., feature selection, class balancing) and informing meta-learning strategies for algorithm recommendation.