Cast precision for custom diffusion attention processor. · Issue #4139 · huggingface/diffusers (original) (raw)

Describe the bug

I see in lora, the dtype are explicitly upcast:
https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py#L529

However, this is not done in custom diffusion. The weights need to be FP32 for mixed precision training, and during validation custom diffusion will throw an error about precision mismatch. Autocast is not recommended in inference AFAIK.

Reproduction

Run the custom diffusion example with BF16.
Make sure you have enough steps (or lower validation epochs) so the validation during training is run. The final inference has a unet.to(fp32) inside so it runs without any problems.

Logs

Something like expected BFloat16 but found Float at key = self.to_k_custom_diffusion(encoder_hidden_states)

System Info

diffusers version: 0.18.2
Platform: Linux-5.4.0-139-generic-x86_64-with-debian-buster-sid
Python version: 3.7.13
PyTorch version (GPU?): 1.12.1 (True)
Huggingface_hub version: 0.16.4
Transformers version: 4.30.2
Accelerate version: 0.20.3
xFormers version: not installed
Using GPU in script?: 2xA10
Using distributed or parallel set-up in script?: DDP

Who can help?

No response