Creating and sharing knowledge for telecommunications

Importance of speaker specific speech features for emotion recognition

Assunção, G. ; Menezes, P. ; Perdigão, F.

Importance of speaker specific speech features for emotion recognition, Proc Experiment Conference , Funchal, Portugal, Vol. , pp. - , June, 2019.

Digital Object Identifier: 10.1109/EXPAT.2019.8876534

 

Abstract
The recognition of emotions is an inherent ability possessed by humans, which has long intrigued many researchers. Primarily due to the possibility of its successful emulation and integration in independent systems. Further, speech, being a mixture of utterances conveying a state of mind, proves to be a suitable candidate from which emotionality can be inferred, due to its many feature variations. This is corroborated by human beings themselves using this modality for extraction of emotionality clues. Another important aspect has to do with communicational register adaptation and the skill to discern different emotions in different speakers. Sure enough, the same emotional utterance may be interpreted divergently for two different people, meaning emotionality specific information is present in a speaker’s personal register. As a demo, we propose a real-time automatic emotion recognition system from speech, based on the use of the well established VGG-like convolutional neural network speaker recognition
model VGGVox, trained with over 100,000 utterances from the VoxCeleb1 dataset on speaker recognition, for emotional feature extraction and feeding to state-of-the-art classifiers for accurate recognition of emotional states. Positive supporting results have been captivating enough to spark interest in the technique.