An Experiment on Feature Selection for Sparse Data
Figueiredo, M. A. T.
An Experiment on Feature Selection for Sparse Data, Proc Portuguese Conf. on Pattern Recognition - RecPad, Aveiro, Portugal, Vol. , pp. - , October, 2009.
Digital Object Identifier:
Download Full text PDF ( 59 KBs)
The problem of feature selection appears when dealing
with datasets having a large number of features. An example
is text classification, based on the bag-of-words model,
where the feature vectors are typically very sparse (i.e.,
most features are zero). In this work, we investigate the
use of simple statistical criteria combined with compressed
sensing to perform feature selection. For a given dataset
with sparse features, compressed sensing yields a smaller
set of features which in principle preserves the relevant information.
Our experimental results on (sparse) standard
datasets from UCI and Reuters show large reduction on the
number of features, without degradation of (sometimes improving) the classification accuracy.