GitHub - siliconflow/onediff: OneDiff: An out-of-the-box acceleration library for diffusion models. (original) (raw)

onediff is an out-of-the-box acceleration library for diffusion models, it provides:

Out-of-the-box acceleration for popular UIs/libs(such as HF diffusers and ComfyUI)
PyTorch code compilation tools and strong optimized GPU Kernels for diffusion models

News

[2024/07/23] 🚀 Up to 1.7x Speedup for Kolors: Kolors Acceleration Report
[2024/06/18] 🚀 Acceleration for DiT models: SD3 Acceleration Report, PixArt Acceleration Report, and Latte Acceleration Report
[2024/04/13] 🚀 OneDiff 1.0 is released (Acceleration of SD & SVD with one line of code)
[2024/01/12] 🚀 Accelerating Stable Video Diffusion 3x faster with OneDiff DeepCache + Int8
[2023/12/19] 🚀 Accelerating SDXL 3x faster with DeepCache and OneDiff

Hiring

We're hiring! If you are interested in working on onediff at SiliconFlow, we have roles open for Interns and Engineers in Beijing (near Tsinghua University).

If you have contributed significantly to open-source software and are interested in remote work, you can contact us at talent@siliconflow.cn with onediff in the email title.

Documentation
More about onediff

Documentation

onediff is the abbreviation of "one line of code to accelerate diffusion models".

Use with HF diffusers and ComfyUI

Performance comparison

SDXL E2E time

Model stabilityai/stable-diffusion-xl-base-1.0;
Image size 1024*1024, batch size 1, steps 30;
NVIDIA A100 80G SXM4;

SVD E2E time

Model stabilityai/stable-video-diffusion-img2vid-xt;
Image size 576*1024, batch size 1, steps 25, decoder chunk size 5;
NVIDIA A100 80G SXM4;

Note that we haven't got a way to run SVD with TensorRT on Feb 29 2024.

Quality Evaluation

We also maintain a repository for benchmarking the quality of generation after acceleration: odeval

Community and Support

Installation

0. OS and GPU Compatibility

Linux
- If you want to use onediff on Windows, please use it under WSL.
- The guide to install onediff in WSL2.
NVIDIA GPUs
- Compatibility with Nvidia GPUs.

1. Install torch and diffusers

Note: You can choose the latest versions you want for diffusers or transformers.

python3 -m pip install "torch" "transformers==4.27.1" "diffusers[torch]==0.19.3"

2. Install a compiler backend

When considering the choice between OneFlow and Nexfort, either one is optional, and only one is needed.

For DiT structural models or H100 devices, it is recommended to use Nexfort.
For all other cases, it is recommended to use OneFlow. Note that optimizations within OneFlow will gradually transition to Nexfort in the future.

Nexfort

Install Nexfort is Optional. The detailed introduction of Nexfort is here.

python3 -m pip install -U torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 torchao==0.1 python3 -m pip install -U nexfort

OneFlow

Install OneFlow is Optional.

NOTE: We have updated OneFlow frequently for onediff, so please install OneFlow by the links below.

CUDA 11.8
For NA/EU users
python3 -m pip install -U --pre oneflow -f https://github.com/siliconflow/oneflow_releases/releases/expanded_assets/community_cu118
For CN users
python3 -m pip install -U --pre oneflow -f https://oneflow-pro.oss-cn-beijing.aliyuncs.com/branch/community/cu118 Click to get OneFlow packages for other CUDA versions.
CUDA 12.1
For NA/EU users
python3 -m pip install -U --pre oneflow -f https://github.com/siliconflow/oneflow_releases/releases/expanded_assets/community_cu122
For CN users
python3 -m pip install -U --pre oneflow -f https://oneflow-pro.oss-cn-beijing.aliyuncs.com/branch/community/cu122
CUDA 12.2
For NA/EU users
python3 -m pip install -U --pre oneflow -f https://github.com/siliconflow/oneflow_releases/releases/expanded_assets/community_cu122
For CN users
python3 -m pip install -U --pre oneflow -f https://oneflow-pro.oss-cn-beijing.aliyuncs.com/branch/community/cu122

3. Install onediff

From PyPI

python3 -m pip install --pre onediff

From source

git clone https://github.com/siliconflow/onediff.git

cd onediff && python3 -m pip install -e .

Or install for development:

# install for dev
cd onediff && python3 -m pip install -e '.[dev]'

# code formatting and linting
pip3 install pre-commit
pre-commit install
pre-commit run --all-files

NOTE: If you intend to utilize plugins for ComfyUI/StableDiffusion-WebUI, we highly recommend installing OneDiff from the source rather than PyPI. This is necessary as you'll need to manually copy (or create a soft link) for the relevant code into the extension folder of these UIs/Libs.

More about onediff

Architecture

Features

Functionality	Details
Compiling Time	About 1 minute (SDXL)
Deployment Methods	Plug and Play
Dynamic Image Size Support	Support with no overhead
Model Support	SD1.5~2.1, SDXL, SDXL Turbo, etc.
Algorithm Support	SD standard workflow, LoRA, ControlNet, SVD, InstantID, SDXL Lightning, etc.
SD Framework Support	ComfyUI, Diffusers, SD-webui
Save & Load Accelerated Models	Yes
Time of LoRA Switching	Hundreds of milliseconds
LoRA Occupancy	Tens of MB to hundreds of MB.
Device Support	NVIDIA GPU 3090 RTX/4090 RTX/A100/A800/A10 etc. (Compatibility with Ascend in progress)

Acceleration for State-of-the-art models

onediff supports the acceleration for SOTA models.

stable: release for public usage, and has long-term support;
beta: release for professional usage, and has long-term support;
alpha: early release for expert usage, and should be careful to use;

AIGC Type	Models	HF diffusers	ComfyUI	SD web UI
Community	Enterprise	Community	Enterprise	Community	Enterprise
Image	SD 1.5	stable	stable	stable	stable	stable	stable
SD 2.1	stable	stable	stable	stable	stable	stable
SDXL	stable	stable	stable	stable	stable	stable
LoRA	stable	stable	stable
ControlNet	stable	stable
SDXL Turbo	stable	stable
LCM	stable	stable
SDXL DeepCache	alpha	alpha	alpha	alpha
InstantID	beta	beta
Video	SVD(stable Video Diffusion)	stable	stable	stable	stable
SVD DeepCache	alpha	alpha	alpha	alpha

Acceleration for production environment

PyTorch Module compilation

compilation with oneflow_compile

Avoid compilation time for new input shape

Support Multi-resolution input

Avoid compilation time for online serving

Compile and save the compiled result offline, then load it online for serving

Save and Load the compiled graph
Compile at one device(such as device 0), then use the compiled result to other device(such as device 1~7). Change device of the compiled graph to do multi-process serving

Distributed Run

If you want to do distributed inference, you can use onediff's compiler to do single-device acceleration in a distributed inference engine such as xDiT

OneDiff Enterprise Solution

If you need Enterprise-level Support for your system or business, you can email us at contact@siliconflow.com, or contact us through the website: https://siliconflow.cn/pricing

Onediff Enterprise Solution
More extreme compiler optimization for diffusion process	Usually another 20%~30% or more performance gain
End-to-end workflow speedup solutions	Sometimes 200%~300% performance gain
End-to-end workflow deployment solutions	Workflow to online model API
Technical support for deployment	High priority support

Citation

@misc{2022onediff, author={OneDiff Contributors}, title = {OneDiff: An out-of-the-box acceleration library for diffusion models}, year = {2022}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/siliconflow/onediff}} }