First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC (original) (raw)

The hybrid speech synthesis, combining an HMM-based parameter trajectories generator and unit selection, was reported to achieve high speech output quality, in some cases even outperforming the "classic" unit selection method, while having reasonable cost of hardware requirements increase, especially when compared to modern DNN-based (e.g. WaveNet) speech synthesis methods. The present paper introduces one of this hybrid approaches, facing up the mismatch between rather smooth flow of parameters when generated by a model and between their varying evolution when obtained from speech. We also describe several modifications of target cost computation, influencing the selection of units being close to the required parameters, while our aim is to obtain a notion of the mutual interactions within the modified selection process. The overall conclusion is covered by listening tests, showing comparable quality of the trial hybrid synthesis described to unit selection method tuned through the years.