GitHub - LucipherDev/ComfyUI-TangoFlux: ComfyUI Custom Nodes for "TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching". This generates high-quality 44.1kHz audio up to 30 seconds using just a text prompt. (original) (raw)
ComfyUI-TangoFlux
ComfyUI Custom Nodes for "TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching". These nodes, adapted from the official implementations, generates high-quality 44.1kHz audio up to 30 seconds using just a text promptproduction.
Installation
- Navigate to your ComfyUI's custom_nodes directory:
- Clone this repository:
git clone https://github.com/LucipherDev/ComfyUI-TangoFlux
- Install requirements:
cd ComfyUI-TangoFlux python install.py
Or Install via ComfyUI Manager
Check out some demos from the official demo page
Example Workflow
Usage
Models can be downloaded using the install.py
script
Manual Download:
- Download TangoFlux from here into
models/tangoflux
- Download text encoders from here into
models/text_encoders/google-flan-t5-large
(Include Everything as shown in the screenshot above. Do Not Rename Anything)
The nodes can be found in "TangoFlux" category as TangoFluxLoader
, TangoFluxSampler
, TangoFluxVAEDecodeAndPlay
.
If you are on low VRAM, try enabling offload_model_to_cpu
in TangoFluxSampler
.
The audio
output of the TangoFluxVAEDecodeAndPlay
can be used as audio input for theComfyUI-VideoHelperSuite VideoCombine
node. (This will not sync audio to the video)
TeaCache can speedup TangoFlux 2x without much audio quality degradation, in a training-free manner.
📈 Inference Latency Comparisons on a Single A800
TangoFlux TeaCache (0.25) TeaCache (0.4) ~4.08 s ~2.42 s ~1.95 s
Citation
@misc{hung2024tangofluxsuperfastfaithful, title={TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization}, author={Chia-Yu Hung and Navonil Majumder and Zhifeng Kong and Ambuj Mehrish and Rafael Valle and Bryan Catanzaro and Soujanya Poria}, year={2024}, eprint={2412.21037}, archivePrefix={arXiv}, primaryClass={cs.SD}, url={https://arxiv.org/abs/2412.21037}, }
@article{liu2024timestep,
title={Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model},
author={Liu, Feng and Zhang, Shiwei and Wang, Xiaofeng and Wei, Yujie and Qiu, Haonan and Zhao, Yuzhong and Zhang, Yingya and Ye, Qixiang and Wan, Fang},
journal={arXiv preprint arXiv:2411.19108},
year={2024}
}