An Information Theoretic Approach to Text Sentiment Analysis
Antão, DPC
;
Figueiredo, M. A. T.
An Information Theoretic Approach to Text Sentiment Analysis, Proc International Conf. on Pattern Recognition Applications and Methods - ICPRAM, Barcelona, Spain, Vol. 0, pp. 0 - 0, January, 2013.
Digital Object Identifier: 0
Abstract
Most approaches to text sentiment analysis rely on human generated lexicon-based feature selection methods, supervised vector-based learning methods, and other solutions that seek to capture sentiment information. Most of these methods, in order to yield acceptable accuracy, require a complex preprocessing stage and careful feature engineering. This paper introduces a coding-theoretic-based sentiment analysis method that dispenses with any text preprocessing or explicit feature engineering, but still achieves state-of-the-art accuracy. By applying the Ziv-Merhav method to estimate the relative entropy (Kullback-Leibler divergence) and the cross parsing length from pairs of sequences of text symbols, we get information theoretic measures that make very few assumptions about the models which are assumed to have generated the sequences. Using these measures, we follow a dissimilarity space approach, on which we apply a standard support vector machine classifier. Experimental evaluation of the proposed approach on a text sentiment analysis problem (more specifically, movie reviews sentiment polarity classification) reveals that it outperforms the previous state-of-the-art, despite being much simpler than the competing methods.