cpu_offload vRAM memory consumption large than 4GB (original) (raw)

Describe the bug

I am using the code from https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings to test cpu_offload, but the vRAM memory consumption is large than 4GB

GPU	cpu_offload enabled	vRAM cost
1080	Yes	4539MB
1080	No	5101MB
TITAN RTX	Yes	5134MB
TITAN RTX	No	5668MB

Reproduction

I am using the code from https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings

import torch from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5",

torch_dtype=torch.float16,

) pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars" pipe.enable_sequential_cpu_offload() image = pipe(prompt).images[0]

Logs

No response

System Info

test on 1080/TITAN RTX

diffusers version: 0.11.1
accelerate version: 0.15.0
Platform: Linux-4.15.0-142-generic-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): 1.10.1+cu111 (True)
Huggingface_hub version: 0.11.1
Transformers version: 4.25.1
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No