[CLI]add onnxruntime infer for cli by yt605155624 · Pull Request #2222 · PaddlePaddle/PaddleSpeech (original) (raw)

use use_onnx to control whether to use onnxruntime inference, use cpu by default cause we install cpu version of onnxruntime in setup.py (Mac cannot install gpu version), cpu_threads is 2 by default

CLI:

paddlespeech tts --input "你好,欢迎使用百度飞桨深度学习框架!" --output default.wav --use_onnx True paddlespeech tts --am speedyspeech_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output ss.wav --use_onnx True paddlespeech tts --voc mb_melgan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output mb.wav --use_onnx True paddlespeech tts --voc pwgan_csmsc --input "你好,欢迎使用百度飞桨深度学习框架!" --output pwgan.wav --use_onnx True paddlespeech tts --am fastspeech2_aishell3 --voc pwgan_aishell3 --input "你好,欢迎使用百度飞桨深度学习框架!" --spk_id 0 --output aishell3_fs2_pwgan.wav --use_onnx True paddlespeech tts --am fastspeech2_aishell3 --voc hifigan_aishell3 --input "你好,欢迎使用百度飞桨深度学习框架!" --spk_id 0 --output aishell3_fs2_hifigan.wav --use_onnx True paddlespeech tts --am fastspeech2_ljspeech --voc pwgan_ljspeech --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output lj_fs2_pwgan.wav --use_onnx True paddlespeech tts --am fastspeech2_ljspeech --voc hifigan_ljspeech --lang en --input "Life was like a box of chocolates, you never know what you're gonna get." --output lj_fs2_hifigan.wav --use_onnx True paddlespeech tts --am fastspeech2_vctk --voc pwgan_vctk --input "Life was like a box of chocolates, you never know what you're gonna get." --lang en --spk_id 0 --output vctk_fs2_pwgan.wav --use_onnx True paddlespeech tts --am fastspeech2_vctk --voc hifigan_vctk --input "Life was like a box of chocolates, you never know what you're gonna get." --lang en --spk_id 0 --output vctk_fs2_hifigan.wav --use_onnx True

Python API:

from paddlespeech.cli.tts import TTSExecutor import time tts_executor = TTSExecutor() time_1 = time.time() wav_file = tts_executor( text='对数据集进行预处理', output='1.wav', am='fastspeech2_csmsc', voc='hifigan_csmsc', lang='zh', use_onnx=True, cpu_threads=2) time_2 = time.time() print("time of first time:", time_2-time_1) wav_file = tts_executor( text='对数据集进行预处理', output='2.wav', am='fastspeech2_csmsc', voc='hifigan_csmsc', lang='zh', use_onnx=True, cpu_threads=2) print("time of second time:", time.time()-time_2)

time of first time: 14.543321371078491 (needs to download models for the first time)
time of second time: 0.5376265048980713

use specified model files:

use specified model files

from paddlespeech.cli.tts import TTSExecutor import time tts_executor = TTSExecutor() time_3 = time.time() wav_file = tts_executor( text='对数据集进行预处理', output='3.wav', am='fastspeech2_csmsc', am_ckpt='./fastspeech2_csmsc_onnx_0.2.0/fastspeech2_csmsc.onnx', phones_dict='./fastspeech2_csmsc_onnx_0.2.0/phone_id_map.txt', voc='hifigan_csmsc', voc_ckpt='./hifigan_csmsc_onnx_0.2.0/hifigan_csmsc.onnx', lang='zh', use_onnx=True, cpu_threads=2) print("time of third time:", time.time()-time_3) time_4 = time.time() wav_file = tts_executor( text='对数据集进行预处理', output='4.wav', am='fastspeech2_csmsc', voc='hifigan_csmsc', lang='zh', use_onnx=True, cpu_threads=2) print("time of forth time:", time.time()-time_4)

time of third time: 8.955731391906738
time of forth time: 0.565178394317627

use specified model files for ljspeech:

NOTE: You must set fs to 22050 for ljspeech when using specified model files for the first time,

cause the defualt value of fs in cli is 24000 but ljspeech's fs is 22050

from paddlespeech.cli.tts import TTSExecutor import time tts_executor = TTSExecutor() time_3 = time.time() wav_file = tts_executor( text="Life was like a box of chocolates, you never know what you're gonna get.", output='lj_test1.wav', am='fastspeech2_ljspeech', am_ckpt='./fastspeech2_ljspeech_onnx_1.1.0/fastspeech2_ljspeech.onnx', phones_dict='./fastspeech2_ljspeech_onnx_1.1.0/phone_id_map.txt', voc='hifigan_ljspeech', voc_ckpt='./hifigan_ljspeech_onnx_1.1.0/hifigan_ljspeech.onnx', lang='en', use_onnx=True, cpu_threads=2, fs=22050) print("time of third time:", time.time()-time_3) time_4 = time.time() wav_file = tts_executor( text="Life was like a box of chocolates, you never know what you're gonna get.", output='lj_test2.wav', am='fastspeech2_ljspeech', voc='hifigan_ljspeech', lang='en', use_onnx=True, cpu_threads=2) print("time of forth time:", time.time()-time_4)

time of third time: 3.591158390045166
time of forth time: 1.7778213024139404