Proposal to integrate into 🤗 Hub by patrickvonplaten · Pull Request #555 · TensorSpeech/TensorFlowTTS (original) (raw)

Hi TensorSpeech team! I hereby propose an integration with the HuggingFace model hub 🤗

This integration would allow you to freely download/upload models from/to the Hugging Face Hub: https://huggingface.co/.

Your users could then directly download model weights, etc within Python without having to manually downloads weights.
Taking your fastspeech_2_inference.ipynb example the following diff would show the code could change to be able to directly download weights from the model hub.

import tensorflow as tf

-from tensorflow_tts.inference import AutoConfig from tensorflow_tts.inference import TFAutoModel from tensorflow_tts.inference import AutoProcessor

processor = AutoProcessor.from_pretrained(

pretrained_path="../tensorflow_tts/processor/pretrained/ljspeech_mapper.json"

pretrained_path="tensorspeech/fastspeech2_tts" )

input_text = "i love you so much." input_ids = processor.text_to_sequence(input_text)

-config = AutoConfig.from_pretrained("../examples/fastspeech2/conf/fastspeech2.v1.yaml") fastspeech2 = TFAutoModel.from_pretrained(

config=config,
pretrained_path="../examples/fastspeech2/checkpoints/model-150000.h5",

pretrained_path="tensorspeech/fastspeech2_tts" is_build=True, name="fastspeech2" )

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference( input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0), speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32), speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), f0_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32), energy_ratios =tf.convert_to_tensor([1.0], dtype=tf.float32), )

As an example, I uploaded a fastspeech model to this repo of the HF hub:
I uploaded some weights exemplary to the hub here: https://huggingface.co/patrickvonplaten/tf_tts_fast_speech_2.
If you'd like to add this feature to your library we would obviously change the organization name from patrickvonplaten to tensorspeech.

You can try it out by running the following code:

import tensorflow as tf

from tensorflow_tts.inference import TFAutoModel from tensorflow_tts.inference import AutoProcessor

processor = AutoProcessor.from_pretrained(pretrained_path="patrickvonplaten/tf_tts_fast_speech_2")

input_text = "i love you so much." input_ids = processor.text_to_sequence(input_text)

fastspeech2 = TFAutoModel.from_pretrained( pretrained_path="patrickvonplaten/tf_tts_fast_speech_2", is_build=True, name="fastspeech2" )

mel_before, mel_after, duration_outputs, _, _ = fastspeech2.inference( input_ids=tf.expand_dims(tf.convert_to_tensor(input_ids, dtype=tf.int32), 0), speaker_ids=tf.convert_to_tensor([0], dtype=tf.int32), speed_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), f0_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), energy_ratios=tf.convert_to_tensor([1.0], dtype=tf.float32), )

Besides freely storing your model weights, we also provide git version control and download statistics for your models :-) We can also provide you with a hosted inference API where users could try out your models directly on the website.

We've already integrated with a couple of other libraries - you can check them out here:

Sorry for the missing tests in the PR - I just did the minimal changes to showcase you how the integration with the HF hub could look like :-) I'd also be more than happy to add you guys to a Slack channel where we could discuss further.

Cheers,
Patrick & Hugging Face team

Also cc @julien-c