Quantitative evaluation of relevant prosodic factors for text-to-speech synthesis in Spanish (original) (raw)

A quantitative comparison of four different proposals for intonation modeling in Spanish is presented. In the framework of a modeling procedure previously introduced by the authors, the stress group is taken as the basic building block and a statistical model is inferred from a corpus for every kind of intonation unit, which is parameterized by means of the four control points of the fitting Bézier function. Applying classical clustering quality assessment metrics to the statistical models predicted under different proposals, an objective comparison is brought among them. From the results, a set of prosodic factors has been taken as the characterization of the stress group and incorporated into a TTS platform, with a reported increase in perceptual and objective quality.