[TTS]Cli male onnx by lym0302 · Pull Request #2945 · PaddlePaddle/PaddleSpeech (original) (raw)
fix #2940
- add male onnx infer, Included languages are zh (Chinese), en (English), mix (Chinese-English mixed) on cli
- update male related am and voc to
synthesize_e2e.py,inference.pyandort_predict_e2e.pyandreleased_model.md - add mix onnx infer on cli
CLI:
male (single spk) use_onnx
paddlespeech tts --am fastspeech2_male --voc pwgan_male --lang zh --input "你好,欢迎使用百度飞桨深度学习框架!" --output male_zh_fs2_pwgan.wav --use_onnx True paddlespeech tts --am fastspeech2_male --voc pwgan_male --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output male_en_fs2_pwgan.wav --use_onnx True paddlespeech tts --am fastspeech2_male --voc pwgan_male --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --output male_mix_fs2_pwgan.wav --use_onnx True paddlespeech tts --am fastspeech2_male --voc hifigan_male --lang zh --input "你好,欢迎使用百度飞桨深度学习框架!" --output male_zh_fs2_hifigan.wav --use_onnx True paddlespeech tts --am fastspeech2_male --voc hifigan_male --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output male_en_fs2_hifigan.wav --use_onnx True paddlespeech tts --am fastspeech2_male --voc hifigan_male --lang mix --input "我们的声学模型使用了 Fast Speech Two, 声码器使用了 Parallel Wave GAN and Hifi GAN." --output male_mix_fs2_hifigan.wav --use_onnx True
mix (multi spks) use_onnx
The lang must be mix
spk 174 is csmcc, spk 175 is ljspeech
paddlespeech tts --am fastspeech2_mix --voc hifigan_csmsc --lang mix --spk_id 174 --input "热烈欢迎您在 Discussions 中提交问题,并在 Issues 中指出发现的 bug。此外,我们非常希望您参与到 Paddle Speech 的开发中!" --output mix_fs2_pwgan_csmsc_spk174.wav --use_onnx True
Python API:
from paddlespeech.cli.tts import TTSExecutor import time tts_executor = TTSExecutor() time_1 = time.time() wav_file = tts_executor( text='对数据集进行预处理', output='1.wav', am='fastspeech2_male', voc='hifigan_male', lang='zh', use_onnx=True, cpu_threads=2) time_2 = time.time() print("time of first time:", time_2-time_1) wav_file = tts_executor( text='对数据集进行预处理', output='2.wav', am='fastspeech2_male', voc='hifigan_male', lang='zh', use_onnx=True, cpu_threads=2) print("time of second time:", time.time()-time_2)
# needs to download models and warm up for the first time
time of first time: 6.387387037277222
time of second time: 0.5331883430480957
use specified model files:
use specified model files
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_male_zh_onnx_1.4.0.zip
unzip fastspeech2_male_zh_onnx_1.4.0.zip
wget https://paddlespeech.bj.bcebos.com/Parakeet/released_models/hifigan/hifigan_male_onnx_1.4.0.zip
unzip hifigan_male_onnx_1.4.0.zip
from paddlespeech.cli.tts import TTSExecutor import time tts_executor = TTSExecutor() time_3 = time.time() wav_file = tts_executor( text='对数据集进行预处理', output='3.wav', am='fastspeech2_male', am_ckpt='./fastspeech2_male_zh_onnx_1.4.0/fastspeech2_male-zh.onnx', phones_dict='./fastspeech2_male_zh_onnx_1.4.0/phone_id_map.txt', voc='hifigan_male', voc_ckpt='./hifigan_male_onnx_1.4.0/hifigan_male.onnx', lang='zh', use_onnx=True, cpu_threads=2) print("time of third time:", time.time()-time_3) time_4 = time.time() wav_file = tts_executor( text='对数据集进行预处理', output='4.wav', am='fastspeech2_male', voc='hifigan_male', lang='zh', use_onnx=True, cpu_threads=2) print("time of forth time:", time.time()-time_4)
time of third time: 5.92711067199707
time of forth time: 0.5471084117889404