Premature Overspecialization in Emotion Recognition Systems

Assunção, G. ; Perdigão, F. ; Menezes, P.

Premature Overspecialization in Emotion Recognition Systems, Proc AES Audio Forensics, Porto, Portugal, Vol. , pp. - , June, 2019.

Emotion recognition from speech, the ability to identify expressed emotional states in vocal utterances, is an inherent ability humans apply in their daily interactions. Though a highly researched topic, it has yet to conform with real human performance levels, which may be due to the overspecialization or inability of most automatic recognition systems to adapt to non-emotional human conversational traits. Given that these traits may contain information pertinent to a speech based recognition system, generalization should be emphasized in early emotional feature extraction stages. To support this, an application of the VGGVox speaker recognition model has been evaluated for emotional feature extraction. Results on state-of-the-art classifiers were comparable to other recent speech emotion recognition techniques.