[Feat] Support SDXL Kohya-style LoRA by sayakpaul · Pull Request #4287 · huggingface/diffusers (original) (raw)

Hello there! I am a little bit confused reading the documentation. Sorry in advance if this is mentioned anywhere, I just couldn't find it.

In the documentation about using Lora trained with Kohya for SD XL this is the example provided:

import torch 

base_model_id = "stabilityai/stable-diffusion-xl-base-0.9"
pipeline = DiffusionPipeline.from_pretrained(base_model_id, torch_dtype=torch.float16).to("cuda")
pipeline.load_lora_weights(".", weight_name="Kamepan.safetensors")

prompt = "anime screencap, glint, drawing, best quality, light smile, shy, a full body of a girl wearing wedding dress in the middle of the forest beneath the trees, fireflies, big eyes, 2d, cute, anime girl, waifu, cel shading, magical girl, vivid colors, (outline:1.1), manga anime artstyle, masterpiece, offical wallpaper, glint <lora:kame_sdxl_v2:1>"
negative_prompt = "(deformed, bad quality, sketch, depth of field, blurry:1.1), grainy, bad anatomy, bad perspective, old, ugly, realistic, cartoon, disney, bad propotions"
generator = torch.manual_seed(2947883060)
num_inference_steps = 30
guidance_scale = 7

image = pipeline(
    prompt=prompt, negative_prompt=negative_prompt, num_inference_steps=num_inference_steps,
    generator=generator, guidance_scale=guidance_scale
).images[0]
image.save("Kamepan.png")

As you can see in the prompt, the prompt weight style is that of the Automatic1111 environment. However, to my understanding, on the official Diffusers documentation for prompt weighting it says this must be done with the Compel library. A similar thing happens when mentioning the Lora at the end of the prompt "lora:kame_sdxl_v2:1". I understand this is not the way of doing this with Diffusers.

Could you please confirm if this prompt is or not correct for Diffusers?

It would be great to have a part of the documentation where the differences on the syntaxes for both A1111 and Diffusers are explained. I see a little confusion in this topic and that would be helpful to avoid incorrectly performing operations such as prompt weighting in a way is not really effective when using Diffusers and not A1111.