GitHub - a-r-r-o-w/finetrainers: Memory-optimized training library for diffusion models (original) (raw)

finetrainers ๐Ÿงช

Finetrainers is a work-in-progress library to support (accessible) training of diffusion models and various commonly used training algorithms.

CogVideoX-LoRA.mp4 pika_effects.mp4
CogVideoX LoRA training as the first iteration of this project Replication of PikaEffects

Table of Contents

Quickstart

Clone the repository and make sure the requirements are installed: pip install -r requirements.txt and install diffusers from source by pip install git+https://github.com/huggingface/diffusers. The requirements specify diffusers>=0.32.1, but it is always recommended to use the main branch of Diffusers for the latest features and bugfixes. Note that the main branch for finetrainers is also the development branch, and stable support should be expected from the release tags.

Checkout to the latest stable release tag:

git fetch --all --tags git checkout tags/v0.2.0

Follow the instructions mentioned in the README for the latest stable release.

Using the main branch

To get started quickly with example training scripts on the main development branch, refer to the following:

The following are some simple datasets/HF orgs with good datasets to test training with quickly:

Please checkout docs/models and examples/training to learn more about supported models for training & example reproducible training launch scripts. For a full list of arguments that can be set for training, refer to docs/args.

Important

It is recommended to use Pytorch 2.5.1 or above for training. Previous versions can lead to completely black videos, OOM errors, or other issues and are not tested. For fully reproducible training, please use the same environment as mentioned in environment.md.

Features

News

Support Matrix

The following trainers are currently supported:

Note

The following numbers were obtained from the release branch. The main branch is unstable at the moment and may use higher memory.

Model Name Tasks Min. LoRA VRAM* Min. Full Finetuning VRAM^
LTX-Video Text-to-Video 5 GB 21 GB
HunyuanVideo Text-to-Video 32 GB OOM
CogVideoX-5b Text-to-Video 18 GB 53 GB
Wan Text-to-Video TODO TODO
CogView4 Text-to-Image TODO TODO
Flux Text-to-Image TODO TODO

*Noted for training-only, no validation, at resolution 49x512x768, rank 128, with pre-computation, using FP8 weights & gradient checkpointing. Pre-computation of conditions and latents may require higher limits (but typically under 16 GB).
^Noted for training-only, no validation, at resolution 49x512x768, with pre-computation, using BF16 weights & gradient checkpointing.

If you would like to use a custom dataset, refer to the dataset preparation guide here.

Checkout some amazing projects citing finetrainers:

Checkout the following UIs built for finetrainers:

Acknowledgements