Creating and sharing knowledge for telecommunications

Efficient Feature Selection Filters for High-Dimensional Data

Ferreira, A. ; Figueiredo, M. A. T.

Pattern Recognition Letters Vol. 33, Nº 13, pp. 1794 - 1804, June, 2012.

ISSN (print): 0167-8655
ISSN (online): 0167-8655

Journal Impact Factor: 1,551 (in 2014)

Digital Object Identifier: 10.1016/j.patrec.2012.05.019

Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be computationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional datasets.

In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised,
semi-supervised, and unsupervised learning being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 100000
features, show the time efficiency of our methods, with lower
generalization error than state-of-the-art techniques,
while being dramatically simpler and faster.