Add Support for Z-Image Series by JerryWu-code · Pull Request #12703 · huggingface/diffusers (original) (raw)
@yiyixuxu
By the way, while testing the _flash_3 and _flash_varlen_3 backends, we noticed that the current implementation in attention_dispatch.py is incompatible with the latest Flash Attention 3 APIs.
The recent FA3 commit (Dao-AILab/flash-attention@203b9b3) introduced a return_attn_probs argument and changed the default behavior. The functions now return a single output tensor by default (instead of a tuple), which causes the current tuple unpacking logic in diffusers to fail:
out, lse, *_ = flash_attn_3_varlen_func( q=query_packed, k=key_packed, v=value_packed, cu_seqlens_q=cu_seqlens_q, cu_seqlens_k=cu_seqlens_k, max_seqlen_q=max_seqlen_q, max_seqlen_k=max_seqlen_k, softmax_scale=scale, causal=is_causal, ) out, lse, *_ = flash_attn_3_func( q=q, k=k, v=v, softmax_scale=softmax_scale, causal=causal, qv=qv, q_descale=q_descale, k_descale=k_descale, v_descale=v_descale, window_size=window_size, attention_chunk=attention_chunk, softcap=softcap, num_splits=num_splits, pack_gqa=pack_gqa, deterministic=deterministic, sm_margin=sm_margin, )
We have implemented a fix that handles this while maintaining backward compatibility:
JerryWu-code@de4c6f1#diff-b027e126a86a26981384b125714e0f3bd9923eaa8322f1ae5f6b53fe3e3481c2
Should we include this fix in the current PR, or would you prefer us to open a separate PR for it?