Fix small inconsistency in output dimension of "_get_t5_prompt_embeds" function in sd3 pipeline by alirezafarashah · Pull Request #12531 · huggingface/diffusers (original) (raw)
What does this PR do?
This PR fixes a small inconsistency in the output dimension of the _get_t5_prompt_embeds function in the Stable Diffusion 3 pipeline.
Previously, when self.text_encoder_3 was None, the function returned a tensor (torch.zeros) with a sequence length of self.tokenizer_max_length (77), which corresponds to the CLIP encoder. However, the T5 text encoder used in SD3 has a different maximum sequence length (256).
As a result, when text_encoder_3 was available, the prompt embeddings had a sequence length of 333 (256 from T5 + 77 from CLIP), but when it was not available, the returned tensor had only 154 (77 + 77), leading to an inconsistency in output dimensions in encode_prompt.
Motivation and Context
This change ensures consistent tensor shapes across different encoder availability conditions in the SD3 pipeline.
It prevents dimension mismatches and potential runtime errors when text_encoder_3 is None.
Previously, the zeros tensor used self.tokenizer_max_length, which corresponds to CLIP, instead of T5’s longer sequence length.
This mismatch led to inconsistent embedding dimensions when combining outputs from CLIP and T5 in encode_prompt.
Changes Made
- Replaced
self.tokenizer_max_lengthwithmax_sequence_lengthwhen returning the zero tensor in_get_t5_prompt_embeds, ensuring consistent output dimensions whethertext_encoder_3isNoneor available.
The samemax_sequence_lengthparameter is already used in the tokenization step of the same function:
text_inputs = self.tokenizer_3(
prompt,
padding="max_length",
max_length=max_sequence_length,
truncation=True,
add_special_tokens=True,
return_tensors="pt",
) - No changes to functionality, inputs, or outputs beyond dimension consistency.
Before submitting
- This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- Did you read the contributor guideline?
- Did you read our philosophy doc (important for complex PRs)?
- Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
- Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings. - Did you write any new necessary tests?
Who can review?
- @asomoza (pipelines and callbacks)
- @yiyixuxu (pipelines and callbacks)
- @sayakpaul (general functionalities)