Automatic Email Foldering with Supervised Learning - Addressing the Class Imbalance Problem
Automatic Email Foldering with Supervised Learning - Addressing the Class Imbalance Problem, Proc Conf. on Electronics, Telecommunications and Computers - CETC, Lisboa, Portugal, Vol. --, pp. -- - --, November, 2011.
Digital Object Identifier:
Automatic organization of email messages into folders is both an open problem and challenge for machine learning techniques. Besides the effect of email overload, there are some increasing difficulties caused the semantics applied by each user. One of such difficulties is the very unequal distribution of messages into
folders. Some folders can have thousands of documents while others might not reach a dozen; this problem is known as class imbalance. This paper addresses automatic organization of email messages into folders, focusing on how to deal with the class imbalance problem, in order to improve the classification results. We present a simple and efficient solution for this problem. The experimental results on a subset of the Enron Corpus and on a private email data set, show the adequacy of the proposed techniques to deal with the imbalance problem.