Improve pos embed for Flux.1 inference on Ascend NPU by gameofdimension · Pull Request #12534 · huggingface/diffusers (original) (raw)
What does this PR do?
Moving pos_embed computation from NPU to CPU results in a 1.4x speedup in Flux.1's end-to-end latency.
| model | device | e2e latency |
|---|---|---|
| FLUX.1-dev | npu | 31s |
| FLUX.1-dev | cpu | 21s |
| FLUX.1-Fill-dev | npu | 64s |
| FLUX.1-Fill-dev | cpu | 46s |
Tested hardware:
Ascend 910B2C
Repro Code
FLUX.1-dev
import torch from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16).to("npu")
prompt = "A cat holding a sign that says hello world" image = pipe( prompt, height=1024, width=1024, guidance_scale=3.5, num_inference_steps=50, max_sequence_length=512, generator=torch.Generator("cpu").manual_seed(0) ).images[0] image.save("flux-dev.png")
FLUX.1-Fill-dev
import torch from diffusers import FluxFillPipeline from diffusers.utils import load_image
image = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png") mask = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png")
pipe = FluxFillPipeline.from_pretrained("black-forest-labs/FLUX.1-Fill-dev", torch_dtype=torch.bfloat16).to("npu") image = pipe( prompt="a white paper cup", image=image, mask_image=mask, height=1632, width=1232, guidance_scale=30, num_inference_steps=50, max_sequence_length=512, generator=torch.Generator("cpu").manual_seed(0) ).images[0] image.save(f"flux-fill-dev.png")
Before submitting
- This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- Did you read the contributor guideline?
- Did you read our philosophy doc (important for complex PRs)?
- Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
- Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings. - Did you write any new necessary tests?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.