Fix small inconsistency in output dimension of "_get_t5_prompt_embeds" function in sd3 pipeline by alirezafarashah · Pull Request #12531 · huggingface/diffusers (original) (raw)

What does this PR do?

This PR fixes a small inconsistency in the output dimension of the _get_t5_prompt_embeds function in the Stable Diffusion 3 pipeline.

Previously, when self.text_encoder_3 was None, the function returned a tensor (torch.zeros) with a sequence length of self.tokenizer_max_length (77), which corresponds to the CLIP encoder. However, the T5 text encoder used in SD3 has a different maximum sequence length (256).

As a result, when text_encoder_3 was available, the prompt embeddings had a sequence length of 333 (256 from T5 + 77 from CLIP), but when it was not available, the returned tensor had only 154 (77 + 77), leading to an inconsistency in output dimensions in encode_prompt.

Motivation and Context

This change ensures consistent tensor shapes across different encoder availability conditions in the SD3 pipeline.
It prevents dimension mismatches and potential runtime errors when text_encoder_3 is None.

Previously, the zeros tensor used self.tokenizer_max_length, which corresponds to CLIP, instead of T5’s longer sequence length.
This mismatch led to inconsistent embedding dimensions when combining outputs from CLIP and T5 in encode_prompt.

Changes Made

Replaced self.tokenizer_max_length with max_sequence_length when returning the zero tensor in _get_t5_prompt_embeds, ensuring consistent output dimensions whether text_encoder_3 is None or available.
The same max_sequence_length parameter is already used in the tokenization step of the same function:
text_inputs = self.tokenizer_3(
prompt,
padding="max_length",
max_length=max_sequence_length,
truncation=True,
add_special_tokens=True,
return_tensors="pt",
)
No changes to functionality, inputs, or outputs beyond dimension consistency.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@asomoza (pipelines and callbacks)
@yiyixuxu (pipelines and callbacks)
@sayakpaul (general functionalities)