Text Classification: A Compressed Learning Approach
Figueiredo, M. A. T.
Text Classification: A Compressed Learning Approach, Proc Portuguese Conf. on Pattern Recognition - RecPad, Vila Real, Portugal, Vol. , pp. - , October, 2010.
Digital Object Identifier:
Download Full text PDF ( 52 KBs)
In text classification based on the bag-of-words (BoW)
or similar representations, we usually have a large number
of features, many of which are irrelevant (or even detrimental)
for classification tasks. Recent results show that
compressed learning (CL), i.e., learning in a domain of reduced
dimensionality obtained by random projections (RP),
is possible, and theoretical bounds on the test set error rate
have been shown. In this work, we assess the performance
of CL, based on RP of BoW representations for text classification.
Our experimental results show that CL significantly
reduces the number of features while simultaneously
improving the classification accuracy. Rather than the mild
decrease in accuracy upper bounded by the theory, we actually
find an increase of accuracy. Our approach is also
suited for unsupervised or semi-supervised learning, without
any modification, since it does not use the class labels.