Speaker Diarization Using Gaussian Mixture Turns and Segment Matching
Speaker Diarization Using Gaussian Mixture Turns and Segment Matching, Proc GTM, RTTH and SIG-IL VI Jornadas en Tecnología del Habla and II Iberian SLTech Workshop - FALA, Vigo, Spain, Vol. -, pp. - - -, November, 2010.
Digital Object Identifier:
Speaker diarization aims to detect “who spoke when” in large audio segments. It is an important task in processing of broadcast news audio, making easier the audio segments selection and indexing task. In this paper an unsupervised speaker diarization scheme is proposed using a Gaussian Mixture Model as a Universal Background Model, Bayesian Information Criterion and fingerprint detection. A decoder that outputs a mixture sequence is used with high mixture transition penalization. Homogeneous segments tend to produce sequences with only one mixture allowing speaker turns to be detected using mixture transitions. Results for the Catalan broadcast news 3/24 TV channel are reported.