A Methodology for Generated Text Annotation for High Quality Speech Synthesis (original) (raw)

2019, 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA)

Natural Language Generators may generate texts that are linguistically enriched. These may result in significantly improved synthetic speech. At the same time, the generators produce pieces of plain text that may span between a single word to a full sentence. Additionally, traditional natural language generators have limited domain coverage, resulting in restricted language analysis of the generated texts. For those cases the enriched input to the speech synthesizer, required for high quality speech synthesis, can be provided by analysing the plain text. This work reports on the method for automatic domain dependent annotation of plain text, through the utilisation of the linguistic information from rich generated text. The synthetic speech from the resulting prosody models is evaluated by human participants showing annotation results for plain text quite on par with the rich generated text. This leads to improved perceived naturalness of the synthesized speech.