GitHub - thu-ml/RIFLEx at multi-gpu (original) (raw)

Installation

The envrionment is the same with HunyuanVideo.

1. Create conda environment

conda create -n HunyuanVideo python==3.10.9

2. Activate the environment

conda activate HunyuanVideo

3. Install PyTorch and other dependencies using conda

For CUDA 11.8

conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia

For CUDA 12.4

conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

4. Install pip dependencies

python -m pip install -r requirements.txt

5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)

python -m pip install ninja python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3

6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)

python -m pip install xfuser==0.4.0

In case of running into `AssertionError: Ulysses Attention and Ring Attention requires xfuser package`, you may update xfusers to 0.4.1 (Click to expand)

pip install xfuser==0.4.1

In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions (Click to expand)

Option 1: Making sure you have installed CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00 (or simply using our CUDA 12 docker image).

pip install nvidia-cublas-cu12==12.4.5.8 export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/

Option 2: Forcing to explictly use the CUDA 11.8 compiled version of Pytorch and all the other packages

pip uninstall -r requirements.txt # uninstall all packages pip uninstall -y xfuser pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118 pip install -r requirements.txt pip install ninja pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3 pip install xfuser==0.4.0

Download Models

cd HunyuanVideo python -m pip install "huggingface_hub[cli]" huggingface-cli download tencent/HunyuanVideo --local-dir ./ckpts

Download our fine-tuned model with RIFLEx

python download.py

Download text-encoders

cd ckpts huggingface-cli download xtuner/llava-llama-3-8b-v1_1-transformers --local-dir ./llava-llama-3-8b-v1_1-transformers python ../hyvideo/utils/preprocess_text_encoder_tokenizer_utils.py --input_dir llava-llama-3-8b-v1_1-transformers --output_dir text_encoder huggingface-cli download openai/clip-vit-large-patch14 --local-dir ./text_encoder_2

Multi GPU Inference

For training-free 2× temporal extrapolation in HunyuanVideo:

torchrun --nproc_per_node=6 sample_video.py
--model-base HunyuanVideo/ckpts
--dit-weight HunyuanVideo/ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt
--video-size 544 960
--infer-steps 50
--flow-reverse
--seed 42
--ulysses-degree 6
--ring-degree 1
--save-path output
--prompt "An animated porcupine with a mix of brown and white fur and prominent quills is seen in a cozy, warmly lit interior setting, interacting with a green gift box with a yellow ribbon. The room is filled with wooden furniture and colorful wall decorations, suggesting a cheerful and domestic atmosphere. The porcupine's large eyes and expressive face convey a sense of lightheartedness and curiosity. The camera maintains a low angle, close to the ground, providing an intimate view of the character's actions without any movement, focusing on the playful and curious mood of the scene. The visual style is characteristic of contemporary 3D animation, with vibrant colors and smooth textures that create a polished and engaging look. The scene transitions to an outdoor environment, showcasing a sunny, verdant landscape with rocks, trees, and grass, indicating a natural, possibly forest-like setting. The presence of a small character in the final frame suggests the continuation of a narrative or the introduction of new characters."
--k 4
--N_k 50
--video-length 261

For finetuned 2× temporal extrapolation in HunyuanVideo:

torchrun --nproc_per_node=6 sample_video.py
--model-base HunyuanVideo/ckpts
--dit-weight HunyuanVideo/ckpts/diffusion_pytorch_model.safetensors
--video-size 544 960
--infer-steps 50
--flow-reverse
--seed 42
--ulysses-degree 6
--ring-degree 1
--save-path output
--prompt "3D animation of a small, round, fluffy creature with big, expressive eyes explores a vibrant, enchanted forest. The creature, a whimsical blend of a rabbit and a squirrel, has soft blue fur and a bushy, striped tail. It hops along a sparkling stream, its eyes wide with wonder. The forest is alive with magical elements: flowers that glow and change colors, trees with leaves in shades of purple and silver, and small floating lights that resemble fireflies. The creature stops to interact playfully with a group of tiny, fairy-like beings dancing around a mushroom ring. The creature looks up in awe at a large, glowing tree that seems to be the heart of the forest."
--k 4
--N_k 66
--video-length 261
--finetune

The prompts list in the project page are provided in assets/prompt_free.txt and assets/prompt_finetune.txt.

Note that in the default parallel setting for HunyuanVideo, the following number of GPUs are supported:

Supported Parallel Configurations (Click to expand)

--ulysses-degree x --ring-degree --nproc_per_node
6x1,3x2,2x3,1x6 6
4x1,2x2,1x4 4
3x1,1x3 3
1x2,2x1 2