[refactor] CogVideoX followups + tiled decoding support by a-r-r-o-w · Pull Request #9150 · huggingface/diffusers (original) (raw)
What does this PR do?
- CogVideoX followups from Add CogVideoX text-to-video generation model #9082
- Support for tiled decoding Code
import gc
import torch from diffusers import CogVideoXPipeline, CogVideoXDDIMScheduler from diffusers.utils import export_to_video
def reset_memory(): gc.collect() torch.cuda.empty_cache() torch.cuda.reset_accumulated_memory_stats() torch.cuda.reset_peak_memory_stats()
def print_memory(): memory = round(torch.cuda.memory_allocated() / 10243, 2) max_memory = round(torch.cuda.max_memory_allocated() / 10243, 2) max_reserved = round(torch.cuda.max_memory_reserved() / 1024**3, 2) print(f"{memory=} GB") print(f"{max_memory=} GB") print(f"{max_reserved=} GB")
prompt = ( "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. " "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other " "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, " "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. " "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical " "atmosphere of this unique musical performance." ) pipe = CogVideoXPipeline.from_pretrained("/raid/aryan/CogVideoX-trial", torch_dtype=torch.float16) pipe.scheduler = CogVideoXDDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")
pipe.enable_model_cpu_offload()
reset_memory() video = pipe(prompt=prompt, num_frames=48, guidance_scale=6, num_inference_steps=50, generator=torch.Generator().manual_seed(42)).frames[0] print_memory() export_to_video(video, "output.mp4", fps=8)
pipe.vae.enable_tiling()
reset_memory() video = pipe(prompt=prompt, num_frames=48, guidance_scale=6, num_inference_steps=50, generator=torch.Generator().manual_seed(42)).frames[0] print_memory() export_to_video(video, "output_tiling.mp4", fps=8)
Memory usage:
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 4.51it/s] Loading pipeline components...: 40%|████████████████████████████████████████████████████████████████████████████▊ | 2/5 [00:00<00:00, 3.29it/s]The config attributes {'mid_block_add_attention': True, 'sample_size': 256} were passed to AutoencoderKLCogVideoX, but are not expected and will be ignored. Please verify your config.json configuration file. Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00, 4.41it/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [02:44<00:00, 3.28s/it]
CPU offloading, normal VAE decoding
memory=0.01 GB max_memory=12.39 GB max_reserved=20.39 GB 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [02:35<00:00, 3.11s/it]
CPU offloading, tiled VAE decoding
memory=0.01 GB max_memory=10.81 GB max_reserved=10.83 GB
Results:
Normal |
---|
output.webm |
Tiled |
output_tiling.webm |
Note that you will need to install accelerate:main
from source for this to work and get the expected numbers I'm getting above. If you're using the stable version of accelerate, you might see an addition 5-7GB usage.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.