make enable_sequential_cpu_offload more generic for third-party devices by ji-huazhong · Pull Request #4191 · huggingface/diffusers (original) (raw)
What does this PR do?
This PR make enable_sequential_cpu_offload
more generic for third-party devices.
I noticed that in #4114, enable_sequential_cpu_offload
has been refactored to be more generic for other devices.
But inside the function enable_sequential_cpu_offload
, we use torch.cuda.empty_cache
to release all unoccupied cache memory, which has no effect for other devices (such as xpu
)
if self.device.type != "cpu": |
---|
self.to("cpu", silence_dtype_warnings=True) |
torch.cuda.empty_cache() # otherwise we don't see the memory savings (but they probably exist) |
We could change torch.cuda.empty_cache()
with another from outside, like
- torch.cuda.empty_cache = torch.xpu.empty_cache device = torch.device("xpu") pipeline.enable_sequential_cpu_offload(device=device)
but it looks a little weird.
I think a better way is
- get the device module according to the device type first
- and then call the corresponding
empty_cache
method.
Now, we can use enable_sequential_cpu_offload
more conveniently with xpu
, like
device = torch.device("xpu")
pipeline.enable_sequential_cpu_offload(device=device)
Before submitting
- This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- Did you read the contributor guideline?
- Did you read our philosophy doc (important for complex PRs)?
- Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
- Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings. - Did you write any new necessary tests?