make enable_sequential_cpu_offload more generic for third-party devices by ji-huazhong · Pull Request #4191 · huggingface/diffusers (original) (raw)

What does this PR do?

This PR make enable_sequential_cpu_offload more generic for third-party devices.

I noticed that in #4114, enable_sequential_cpu_offload has been refactored to be more generic for other devices.
But inside the function enable_sequential_cpu_offload, we use torch.cuda.empty_cache to release all unoccupied cache memory, which has no effect for other devices (such as xpu)

if self.device.type != "cpu":
self.to("cpu", silence_dtype_warnings=True)
torch.cuda.empty_cache() # otherwise we don't see the memory savings (but they probably exist)

We could change torch.cuda.empty_cache() with another from outside, like

torch.cuda.empty_cache = torch.xpu.empty_cache device = torch.device("xpu") pipeline.enable_sequential_cpu_offload(device=device)

but it looks a little weird.

I think a better way is

get the device module according to the device type first
and then call the corresponding empty_cache method.

Now, we can use enable_sequential_cpu_offload more conveniently with xpu, like

device = torch.device("xpu")
pipeline.enable_sequential_cpu_offload(device=device)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@patrickvonplaten and @sayakpaul