Add support for lumina2 by zhuole1025 · Pull Request #10642 · huggingface/diffusers (original) (raw)

so using a list, and switching the order:

long prompt first:

mask no mask
output_mask1_prompt0 output_mask1_prompt1
output_mask0_prompt0 output_mask0_prompt1

short prompt first:

mask no mask
output_mask1_prompt0 output_mask1_prompt1
output_mask0_prompt0 output_mask0_prompt1

Doing some more tests, the order and the longer prompt doesn't matter, the only one that changes a lot is the shorter prompt, so I'll post those only:

shorter prompt without text: "Photo of a capybara wearing sunglasses and a jacket"

mask no mask
output_mask1_prompt0 output_mask0_prompt0

even shorter prompt without text: "a capybara"

mask no mask
output_mask1_prompt0 output_mask0_prompt0

medium prompt with text: "photo of a cat wearing stylish black sunglasses. The cat has a light brown and white fur pattern with distinct stripes. Behind the cat, there is a wooden sign with the text: Beware of the cat!!! written in a playful, handwritten style."

mask no mask
output_mask1_prompt0 output_mask0_prompt0

medium prompt without text: "photo of a cat wearing stylish black sunglasses and a leather jacket. The cat has a light brown and white fur pattern with distinct stripes. The overall scene has a humorous and whimsical tone, combining the cat's cool demeanor with human clothes and eyewear."

mask no mask
output_mask1_prompt0 output_mask0_prompt0

Finished with the testings, this is a nice find @yiyixuxu , we can infer that without a mask and using a list of prompts where one is shorter, the text generation will take a quality hit, so this is important for web services or APIs that use this model with those.

Overall, I can't really see any other loss in quality with the exception that a shorter prompt, without details, will look worse without a mask because the model does the very basic with the prompt. For example, I noticed that without a mask and if you ask for a subject only, the model will most of the time generate the subject without a background which IMO is really nice but for normal people or use cases this won't look good because of the lack of detail and lighting.