make enable_sequential_cpu_offload more generic for third-party devices by ji-huazhong · Pull Request #4191 · huggingface/diffusers (original) (raw)

What does this PR do?

This PR make enable_sequential_cpu_offload more generic for third-party devices.


I noticed that in #4114, enable_sequential_cpu_offload has been refactored to be more generic for other devices.
But inside the function enable_sequential_cpu_offload, we use torch.cuda.empty_cache to release all unoccupied cache memory, which has no effect for other devices (such as xpu)

if self.device.type != "cpu":
self.to("cpu", silence_dtype_warnings=True)
torch.cuda.empty_cache() # otherwise we don't see the memory savings (but they probably exist)

We could change torch.cuda.empty_cache() with another from outside, like

but it looks a little weird.

I think a better way is

Now, we can use enable_sequential_cpu_offload more conveniently with xpu, like

device = torch.device("xpu")
pipeline.enable_sequential_cpu_offload(device=device)

Before submitting

Who can review?

@patrickvonplaten and @sayakpaul