[LoRA] support Kohya Flux LoRAs that have text encoders as well by sayakpaul · Pull Request #9542 · huggingface/diffusers (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation14 Commits7 Checks18 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

What does this PR do?

https://huggingface.co/cocktailpeanut has a bunch of very nice Flux LoRAs that were trained using Kohya but has text encoder components too. This PR adds support for fully loading those LoRAs.

Test code (has a slow test in this PR too):

from diffusers import FluxPipeline import torch

pipeline = FluxPipeline.from_pretrained( "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16 ).to("cuda")

pipeline.load_lora_weights("cocktailpeanut/optimus", weight_name="optimus.safetensors")

prompts = [ "optimus is cleaning the house with broomstick", "optimus is a DJ performing at a hip nightclub", "optimus is competing in a bboy break dancing competition", "optimus is playing tennis in a tennis court" ] images = pipeline( prompts, num_inference_steps=50, guidance_scale=3.5, max_sequence_length=512, generator=torch.manual_seed(0) ).images for i, image in enumerate(images): image.save(f"{i}.png")

Favorite sample:

optimus is a DJ performing at a hip nightclub

Sorry for the review request messup.

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@asomoza can you give this a review too?

Thanks, LGTM in respect to the changes.

Since we now have text encoders training with a transformer model, I did some tests with the blockwise loras that I often use with SDXL:

no lora	just transformer	with TEs

This is not related to this PR but so that we just know.

if I do this:

scales = {"text_encoder": 0.0, "text_encoder_2": 0.0, "transformer": 0.0} pipe.set_adapters("optimus", adapter_weights=scales)

It works as intended, but I copy & pasted the same code I use for SDXL:

scales = {"text_encoder": 0.0, "text_encoder_2": 0.0, "unet": 0.0} pipe.set_adapters("optimus", adapter_weights=scales)

This doesn't work, the transformer keeps the lora scale at 1.0 but it doesn't show an error or warning that I'm setting the "unet" instead of the "transformer".

@asomoza

scales = {"text_encoder": 0.0, "text_encoder_2": 0.0, "transformer": 0.0}

I think it shouldn't apply to this LoRA because it doesn't have the text_encoder_2 component in the first place. I can look into catching this and erroring/warning as needed. Will look into the "unet" thingy as well. Thanks much for flagging!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! Thanks for adding it

This was referenced

Oct 2, 2024

leisuzz pushed a commit to leisuzz/diffusers that referenced this pull request

Oct 11, 2024

sayakpaul added a commit that referenced this pull request

Dec 23, 2024

support kohya flux loras that have tes.

Hello!
Seems:
if not all(k.startswith("lora_te1") for k in remaining_keys): raise ValueError(f"Incompatible keys detected: \n\n {', '.join(remaining_keys)}")
It is unnecessary or logic should be changed.
If this exists many Kohya models without text encoders are crashed.
For example (crashed):
https://civitai.com/models/200251?modelVersionId=1081295
And many others are crashed. It is a serious bug, please fix it.
Thank you!

Can you open a new issue thread with a fully reproducible code snippet?

Hello, the reproducing way is very simple, you can try to open any Kohya model without a text encoder.

Not sure about it really. I just ran pytest tests/lora/ -k "test_flux_kohya" and everything passed.

@RIOFornium #10532.

Also, I would suggest avoid being anecdotal here.

Hello, the reproducing way is very simple, you can try to open any Kohya model without a text encoder.
Hello, but in real life is not working :-)
Tests also can have bugs...

I ran two tests where one LoRA has a text encoder and another one doesn't. Both of them passed. So, your first statement is utterly incorrect.
What is real-life? We cannot predict every possibility in the community checkpoints beforehand and we fix them on a case-by-case basis. Hence, the testing setup. You cited one LoRA that had the bug and that maybe useful for your application. That may not be the case for others, and for those, our test cases might very well be sufficiently representative. So, there's no clear definition of "real-world" here, IMO.

I kept asking for a simple reproducer and you just referred me to the Civit AI model link. Please try empathizing with the maintainer's time. To get it to work:

I had to download the CivitAI model.
Upload to the Hub. I need this because I don't have a local GPU and have to rely on remote ones.
Figure out a reproducible code snippet.

Expecting a simple reproducer isn't too much of an ask and I would suggest respecting that going forward. So far I have engaged in good faith. If you keep providing incomplete information, I will just stop engaging.

This applies to you and everyone else having similar attitude.

Hello!
Sorry, I did not want to hurt you.
It is my first collaboration here.
Next time I will be more constructive.
Thank you very much for the fix and your beneficial work!