Proposal to integrate into 馃 Hub by patrickvonplaten 路 Pull Request #555 路 TensorSpeech/TensorFlowTTS (original) (raw)
Hi TensorSpeech team! I hereby propose an integration with the HuggingFace model hub 馃
This integration would allow you to freely download/upload models from/to the Hugging Face Hub: https://huggingface.co/.
Your users could then directly download model weights, etc within Python without having to manually downloads weights.
Taking your fastspeech_2_inference.ipynb example the following diff would show the code could change to be able to directly download weights from the model hub.
import tensorflow as tf
-from tensorflow_tts.inference import AutoConfig from tensorflow_tts.inference import TFAutoModel from tensorflow_tts.inference import AutoProcessor
processor = AutoProcessor.from_pretrained(
- pretrained_path="../tensorflow_tts/processor/pretrained/ljspeech_mapper.json"
- pretrained_path="tensorspeech/fastspeech2_tts" )
input_text = "i love you so much." input_ids = processor.text_to_sequence(input_text)
-config = AutoConfig.from_pretrained("../examples/fastspeech2/conf/fastspeech2.v1.yaml") fastspeech2 = TFAutoModel.from_pretrained(
- config=config,
- pretrained_path="../examples/fastspeech2/checkpoints/model-150000.h5",
- pretrained_path="tensorspeech/fastspeech2_tts" is_build=True, name="fastspeech2" )
mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference( input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0), speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32), speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), f0_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32), energy_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32), )
As an example, I uploaded a fastspeech model to this repo of the HF hub:
I uploaded some weights exemplary to the hub here: https://huggingface.co/patrickvonplaten/tf_tts_fast_speech_2.
If you'd like to add this feature to your library we would obviously change the organization name from patrickvonplaten
to tensorspeech
.
You can try it out by running the following code:
import tensorflow as tf
from tensorflow_tts.inference import TFAutoModel from tensorflow_tts.inference import AutoProcessor
processor = AutoProcessor.from_pretrained(pretrained_path="patrickvonplaten/tf_tts_fast_speech_2")
input_text = "i love you so much." input_ids = processor.text_to_sequence(input_text)
fastspeech2 = TFAutoModel.from_pretrained( pretrained_path="patrickvonplaten/tf_tts_fast_speech_2", is_build=True, name="fastspeech2" )
mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference( input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0), speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32), speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), f0_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), energy_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), )
Besides freely storing your model weights, we also provide git version control and download statistics for your models :-) We can also provide you with a hosted inference API where users could try out your models directly on the website.
We've already integrated with a couple of other libraries - you can check them out here:
- Timm library: https://huggingface.co/julien-c/timm-dpn92
- Speechbrain: https://huggingface.co/speechbrain/asr-crdnn-commonvoice-fr
- Espnet: https://huggingface.co/julien-c/kan-bayashi-jsut_tts_train_tacotron2_ja
Sorry for the missing tests in the PR - I just did the minimal changes to showcase you how the integration with the HF hub could look like :-) I'd also be more than happy to add you guys to a Slack channel where we could discuss further.
Cheers,
Patrick & Hugging Face team
Also cc @julien-c