[SD 3.5 Dreambooth LoRA] support configurable training block & layers by linoytsaban · Pull Request #9762 · huggingface/diffusers (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation20 Commits25 Checks8 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

linoytsaban

This PR adds

Group 3-14

@linoytsaban

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@linoytsaban

cc @bghira - maybe you explored and have some insights/thoughts to share too 🤗

sayakpaul

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Left a couple of comments. LMK if they make sense or are unclear.

> **Photorealism**
> In preliminary testing, we observed that freezing the last few layers of the architecture significantly improved model training when using a photorealistic dataset, preventing detail degradation introduced by small dataset from happening.
> **Anatomy preservation**
> To dampen any possible degradation of anatomy, training only the attention layers and **not** the adaptive linear layers could help. For reference, below is one of the transformer blocks.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes total sense to me!

- with `--lora_layers` you can specify the types of layers you wish to train.
By default, the trained layers are -
`"attn.add_k_proj","attn.add_q_proj","attn.add_v_proj", "attn.to_add_out","attn.to_k","attn.to_out.0","attn.to_q","attn.to_v"`
If you wish to have a leaner LoRA / train more blocks over layers you could pass -

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaner LoRA targetting what aspect? From what I understand, this heuristic is for targeting a specific quality, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the feature was added to allow experimentation with what layers produce the best quality, but since the default (once the pr is merged) will be
attn.add_k_proj attn.add_q_proj attn.add_v_proj attn.to_add_out attn.to_k attn.to_out.0 attn.to_q attn.to_v
which makes every trained block chunkier than the previous default, I wanted to also give as an example the previous setting we had which is
--lora_layers attn.to_k attn.to_q attn.to_v attn.to_out.0
that will result in a smaller lora.
But if it's confusing/unclear I can remove that

Comment on lines +167 to +172

# when not training the text encoder, all the parameters in the state dict should start
# with `"transformer"` in their names.
# In this test, only params of transformer block 0 should be in the state dict
starts_with_transformer = all(
key.startswith("transformer.transformer_blocks.0") for key in lora_state_dict.keys()
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super!

@linoytsaban

@linoytsaban

@linoytsaban

@linoytsaban

@linoytsaban

@linoytsaban

@linoytsaban

@linoytsaban

@linoytsaban

@linoytsaban

@linoytsaban

@linoytsaban

…-explorations

Conflicts:

examples/dreambooth/train_dreambooth_lora_sd3.py

@linoytsaban

sayakpaul

`attn.add_k_proj,attn.add_q_proj,attn.add_v_proj,attn.to_add_out,attn.to_k,attn.to_out.0,attn.to_q,attn.to_v`
If you wish to have a leaner LoRA / train more blocks over layers you could pass -
```diff
--lora_layers attn.to_k,attn.to_q,attn.to_v,attn.to_out.0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we deleting this block or we're suggesting to add it? In case of addition, it should be like so + --lora_layers attn.to_k,attn.to_q,attn.to_v,attn.to_out.0 under the diff block.

sayakpaul

# when not training the text encoder, all the parameters in the state dict should start
# with `"transformer"` in their names.
# In this test, only params of transformer block 0 should be in the state dict

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisit the comment?

sayakpaul

--train_batch_size 1
--gradient_accumulation_steps 1
--max_train_steps 2
--lora_blocks 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make the value we're passing here a constant so that it's clear from the test as to what we're varying. Does that make sense?

sayakpaul

--train_batch_size 1
--gradient_accumulation_steps 1
--max_train_steps 2
--lora_layers attn.to_k

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sayakpaul

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work here. I love it, especially the tests! I left only minor comments, LMK if they make sense.

sayakpaul pushed a commit that referenced this pull request

Dec 23, 2024

@linoytsaban @sayakpaul

…#9762)