[Feat] add: utility for unloading lora. by sayakpaul · Pull Request #4034 · huggingface/diffusers (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation33 Commits9 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

What does this PR do?

Related to #4027 and #3689.

This PR adds a simple test to check if unloading a LoRA attention processor was successful.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@patrickvonplaten

sayakpaul marked this pull request as ready for review

July 11, 2023 03:55

The documentation is not available anymore as the PR was closed or merged.

@sayakpaul do you think a pipe.unload() lora function could make sense here given the importance of LoRA?

@sayakpaul do you think a pipe.unload() lora function could make sense here given the importance of LoRA?

I don't since the existing functionality already cuts the deal for us.

Update: In retrospect, now, I think it makes a lot of sense to add an unloading utility. Let me work on that in this PR.

@@ -537,3 +536,34 @@ def test_vanilla_funetuning(self):
expected = np.array([0.7406, 0.699, 0.5963, 0.7493, 0.7045, 0.6096, 0.6886, 0.6388, 0.583])

self.assertTrue(np.allclose(images, expected, atol=1e-4))

def test_unload_lora(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Could we maybe also add a fast test as it'll be important to make sure this method stays correct

@@ -280,6 +280,10 @@ Note that the use of [`~diffusers.loaders.LoraLoaderMixin.load_lora_weights`] is
Note that it is possible to provide a local directory path to [`~diffusers.loaders.LoraLoaderMixin.load_lora_weights`] as well as [`~diffusers.loaders.UNet2DConditionLoadersMixin.load_attn_procs`]. To know about the supported inputs,
refer to the respective docstrings.

## Unloading LoRA parameters

You can call [`~diffusers.loaders.LoraLoaderMixin.unload_lora_weights`] on a pipeline to unload the LoRA parameters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the utility method - can we also add some fast tests here? Think this would help a lot to not accidentally break unloading / loading LoRA

sayakpaul changed the title~~[Tests] add: test for testing unloading lora.~~ [Feat] add: utility for unloading lora.

), "LoRA parameters should lead to a different image slice."
assert np.allclose(
orig_image_slice, orig_image_slice_two, atol=1e-3
), "Unloading LoRA parameters should lead to results similar to what was obtained with the pipeline without any LoRA parameters."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great asserts!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

orpatashnik pushed a commit to orpatashnik/diffusers that referenced this pull request

add: test for testing unloading lora.
add :reason to skipif.
initial implementation of lora unload().
apply styling.
add: doc.
change checkpoints.
reinit generator
finalize slow test.
add fast test for unloading lora.

orpatashnik pushed a commit to orpatashnik/diffusers that referenced this pull request

add: test for testing unloading lora.
add :reason to skipif.
initial implementation of lora unload().
apply styling.
add: doc.
change checkpoints.
reinit generator
finalize slow test.
add fast test for unloading lora.

orpatashnik pushed a commit to orpatashnik/diffusers that referenced this pull request

add: test for testing unloading lora.
add :reason to skipif.
initial implementation of lora unload().
apply styling.
add: doc.
change checkpoints.
reinit generator
finalize slow test.
add fast test for unloading lora.

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request

add: test for testing unloading lora.
add :reason to skipif.
initial implementation of lora unload().
apply styling.
add: doc.
change checkpoints.
reinit generator
finalize slow test.
add fast test for unloading lora.

AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request

add: test for testing unloading lora.
add :reason to skipif.
initial implementation of lora unload().
apply styling.
add: doc.
change checkpoints.
reinit generator
finalize slow test.
add fast test for unloading lora.

Hey guys, could this be done for a specific lora weight? When I dont want to unload all my loras, but a specific one using its name @sayakpaul

@sayakpaul I went through fuse_lora(), but again i want to be able to dynamically load and unload multiple loras on demand, to fit my gpu memory. The loras which I load can change vastly so I dont want to fuse it into the pipeline and save them.

So is it possible to just unload a specific lora weight?

@sayakpaul set_lora_device() makes sense to use, I will try this out. Thank you

@sayakpaul Would there be a limit as to how many loras I can offload onto the cpu?
I would preferably want to offload them entirely if they are just gonna take up cpu resources.

Would there be a limit as to how many loras I can offload onto the CPU?

That should be constrained by the amount of CPU memory you have.

I'd like to unload it entirely, and not constricted by the cpu memory. Is there a method for that? or is it possible to implement that? @sayakpaul

You can then use unload_lora_weights().

as I mentioned previously, I want to unload a specific lora, maybe by specifying the adapter name it was loaded with. I dont want to unload everything from the pipe and reload - that would be too time consuming.

I don't understand your use case then. You mentioned you wanted to offload completely. Where did you want to offload then?

i am loading multiple lora weights, and if I want to unload one specific lora weight, i want to do so based on adapter name, and it should only unload that one lora weight. I dont want to do unload_lora_weights() and unload everything, and load back only what I want. Does that make sense?

Hmm, we could probably support unload_lora_weights() with a specific adapter name(s)? Sounds like that would cut it for you?

Yes please that would be perfect.

will deleting the adapter offload it from the gpu? Or is it similar to set_adapters()

It would free the memory. Possible a manual torch.cuda.empty_cache() and gc.collect() call would be needed.

@BenjaminBossan The delete_adapter() function is only available on PeftModel, but I'm trying to use it on sdxl diffusion pipelines. Is there an equivalent solution for this within StableDiffusionXLPipeline?

@fvviz Could you try pipe.delete_adapters(<adapter-name>)?