The RoyalFlush Synthesis System for Blizzard Challenge 2020 (original) (raw)

2020, Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020

The paper presents the RoyalFlush synthesis system for Blizzard Challenge 2020. Two required voices are built from the released Mandarin and Shanghainese data. Based on endto-end speech synthesis technology, some improvements are introduced to the system compared with our system of last year. Firstly, a Mandarin front-end transforming input text into phoneme sequence along with prosody labels is employed. Then, to improve speech stability, a modified Tacotron acoustic model is proposed. Moreover, we apply GMM-based attention mechanism for robust long-form speech synthesis. Finally, a lightweight LPCNet-based neural vocoder is adopted to achieve a nice traceoff between effectiveness and efficiency. Among all the participating teams of the Challenge, the identifier for our system is N. Evaluation results demonstrates that our system performs relatively well in intelligibility. But it still needs to be improved in terms of naturalness and similarity.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.