Comparing different machine learning approaches for disfluency structure detection in a corpus of university lectures
Medeiros, R. B.
; Batista, M. M.
; Moniz, G. S.
; Trancoso, I.
Comparing different machine learning approaches for disfluency structure detection in a corpus of university lectures, Proc Speech and Language Technology in Education - SLaTE, Porto, Portugal, Vol. 1, pp. 259 - 269, June, 2013.
Digital Object Identifier: 10.4230/OASIcs.xxx.yyy.p
This paper presents a number of experiments focusing the performance of different machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Reported experiments are based on audio segmentation and prosodic features calculated from a corpus of university lectures in European Portuguese, containing about 24h of speech and about 7.5% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), and Multilayer Perceptron. Since the aim of the task is to perform a discriminative identification of the structural disfluent regions, CARTs outperform the others methods due to the very informed selection of the main features for each region. This work shows that using fully automatic prosodic features and CARTs disfluency structural regions can be reliably/suitably identified. The best results achieved using CARTs correspond to 83.6% precision, 32.5% recall, and 46.8 F-measure. All structural regions are being identified, but the best results concern the detection of the interregnum, followed by the detection of the interruption point.