Creating and sharing knowledge for telecommunications

Disfluency Detection Based on Prosodic Features for University Lectures

Medeiros, R. B. ; Moniz, G. S. ; Batista, M. M. ; Trancoso, I. ; Nunes, LMMN

Disfluency Detection Based on Prosodic Features for University Lectures, Proc Annual Conf. of the International Speech Communication Association - Interspeech, Lyon, France, Vol. NA, pp. NA - NA, August, 2013.

Digital Object Identifier: 0

This paper focuses on the identification of disfluent sequences
and their distinct structural regions, based on acoustic and
prosodic features. Reported experiments are based on a corpus
of university lectures in European Portuguese, with roughly
32h, and a relatively high percentage of disfluencies (7.6%).
The set of features automatically extracted from the corpus
proved to be discriminant of the regions contained in the production
of a disfluency. Several machine learning methods have
been applied, but the best results were achieved using Classification
and Regression Trees (CART). The set of features
which was most informative for cross-region identification encompasses
word duration ratios, word confidence score, silent
ratios, and pitch and energy slopes. Features such as the number
of phones and syllables per word proved to be more useful
for the identification of the interregnum, whereas energy slopes
were most suited for identifying the interruption point.
Index Terms: prosodic features, automatic disfluency detection,
corpus of university lectures, machine learning.