Generating a pronunciation dictionary for European Portuguese using a joint-sequence model with embedded stress assignment

Veiga, A. ; Candeias, S. ; Perdigão, F.

Journal of the Brazilian Computer Society Vol. 19, Nº 2, pp. 127 - 134, September, 2012.

Digital Object Identifier: 10.1007/s13173-012-0088-0

This paper addresses the problem of grapheme to phoneme conversion to create a pronunciation dictionary from a vocabulary of the most frequent words in European Portuguese. A system based on a mixed approach funded on a stochastic model with embedded rules for stressed vowel assignment is described. The implemented model can generate pronunciations from unrestricted words; however, a dictionary with the 40k most frequent words was constructed and corrected interactively. The dictionary includes homographs with multiplepronunciations. The vocabulary was defined using the CETEMPúblico corpus. The model and dictionary are publicly available.