[TTS]Cli male onnx by lym0302 · Pull Request #2945 · PaddlePaddle/PaddleSpeech (original) (raw)

add male onnx infer, Included languages are zh (Chinese), en (English), mix (Chinese-English mixed) on cli
update male related am and voc to synthesize_e2e.py, inference.py and ort_predict_e2e.py and released_model.md
add mix onnx infer on cli

CLI:

male (single spk) use_onnx

paddlespeech tts --am fastspeech2_male --voc pwgan_male --lang zh --input "你好，欢迎使用百度飞桨深度学习框架！" --output male_zh_fs2_pwgan.wav --use_onnx True paddlespeech tts --am fastspeech2_male --voc pwgan_male --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output male_en_fs2_pwgan.wav --use_onnx True paddlespeech tts --am fastspeech2_male --voc pwgan_male --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --output male_mix_fs2_pwgan.wav --use_onnx True paddlespeech tts --am fastspeech2_male --voc hifigan_male --lang zh --input "你好，欢迎使用百度飞桨深度学习框架！" --output male_zh_fs2_hifigan.wav --use_onnx True paddlespeech tts --am fastspeech2_male --voc hifigan_male --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output male_en_fs2_hifigan.wav --use_onnx True paddlespeech tts --am fastspeech2_male --voc hifigan_male --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --output male_mix_fs2_hifigan.wav --use_onnx True

mix (multi spks) use_onnx

The `lang` must be `mix`

spk 174 is csmcc, spk 175 is ljspeech

paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --spk_id 174 --input "热烈欢迎您在 Discussions 中提交问题，并在 Issues 中指出发现的 bug。此外，我们非常希望您参与到 Paddle Speech 的开发中！" --output mix_fs2_pwgan_csmsc_spk174.wav --use_onnx True

Python API:

from paddlespeech.cli.tts import TTSExecutor import time tts_executor = TTSExecutor() time_1 = time.time() wav_file = tts_executor( text='对数据集进行预处理', output='1.wav', am='fastspeech2_male', voc='hifigan_male', lang='zh', use_onnx=True, cpu_threads=2) time_2 = time.time() print("time of first time:", time_2-time_1) wav_file = tts_executor( text='对数据集进行预处理', output='2.wav', am='fastspeech2_male', voc='hifigan_male', lang='zh', use_onnx=True, cpu_threads=2) print("time of second time:", time.time()-time_2)

# needs to download models and warm up for the first time
time of first time:  6.387387037277222
time of second time: 0.5331883430480957

use specified model files:

use specified model files

wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_male_zh_onnx_1.4.0.zip

unzip fastspeech2_male_zh_onnx_1.4.0.zip

wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_male_onnx_1.4.0.zip

unzip hifigan_male_onnx_1.4.0.zip

from paddlespeech.cli.tts import TTSExecutor import time tts_executor = TTSExecutor() time_3 = time.time() wav_file = tts_executor( text='对数据集进行预处理', output='3.wav', am='fastspeech2_male', am_ckpt='./fastspeech2_male_zh_onnx_1.4.0/fastspeech2_male-zh.onnx', phones_dict='./fastspeech2_male_zh_onnx_1.4.0/phone_id_map.txt', voc='hifigan_male', voc_ckpt='./hifigan_male_onnx_1.4.0/hifigan_male.onnx', lang='zh', use_onnx=True, cpu_threads=2) print("time of third time:", time.time()-time_3) time_4 = time.time() wav_file = tts_executor( text='对数据集进行预处理', output='4.wav', am='fastspeech2_male', voc='hifigan_male', lang='zh', use_onnx=True, cpu_threads=2) print("time of forth time:", time.time()-time_4)

time of third time: 5.92711067199707
time of forth time: 0.5471084117889404