Unsupervised Feature Selection for Sparse Data
Figueiredo, M. A. T.
Unsupervised Feature Selection for Sparse Data, Proc European Symp. on Artificial Neural Networks - ESANN , Bruges, Belgium, Vol. 1, pp. 339 - 344, April, 2011.
Digital Object Identifier:
Download Full text PDF ( 109 KBs)
Feature selection is a well-known problem in machine learning
and pattern recognition. Many high-dimensional datasets are sparse, that is, many features have zero value. In some cases, we do not known the class label for some (or even all) patterns in the dataset, leading us to semi-supervised or unsupervised learning problems.
For instance, in text classification with the bag-of-words (BoW) representations, there is usually a large number of features, many of which may be irrelevant (or even detrimental) for categorization tasks. In this paper, we propose one efficient unsupervised feature selection technique for sparse
data, suitable for both standard floating point and binary features.
The experimental results on standard datasets show that the proposed method yields efficient feature selection, reducing the number of features while simultaneously improving the classification accuracy.