[Kandinsky] Add combined pipelines / Fix cpu model offload / Fix inpainting by patrickvonplaten 路 Pull Request #4207 路 huggingface/diffusers (original) (raw)
What does this PR do?
馃毃馃毃馃毃 1. Breaking change - fixes mask input 馃毃馃毃馃毃
NOW: mask_image
repaints white pixels and preserves black pixels
Kandinksy was using an incorrect mask format. Instead of using white pixels as a mask (like SD & IF do), Kandinsky models were using black pixels. This needs to be corrected and so that the diffusers
API is aligned. We cannot have different mask formats for different pipelines.
Important => This means that everyone that already used Kandinsky Inpaint in production / pipeline now needs to change the mask to:
For PIL input
import PIL.ImageOps mask = PIL.ImageOps.invert(mask)
For PyTorch and Numpy input
mask = 1 - mask
Once this PR is merged we also need to correct all the model cards (cc @yiyixuxu)
2. Adds combined pipelines
As noticed in #4161 by @vladmandic , diffusers
currently has an inconsistent design between Kandinsky Pipelines and other pipelines. The reason for this is that all Kandinsky pipelines (txt2img, img2img & inpaint) are based on Dalle2's UnCLIP design meaning they have to run two diffusion pipelines:
- prior which diffuses text embeddings to image embeddings which is the same for all t2i, img2img, inpaint)
- decoder which diffuses image embeddings to images (each t2i, img2img, inpaint have different pipelines)
Running just the prior or the decoder on its own often makes no sense so we should give the user an easier UX here while making sure we still keep the pipelines separated so that they can be run independently (e.g. on different nodes).
This PR introduces a mechanism that allows to load required prior pipelines directly when loading a decoder pipeline and puts all components in a single "combined" pipeline. Required pipelines are defined in the decoder's model card here: https://huggingface.co/kandinsky-community/kandinsky-2-1/blob/73bf6fba5b4410c671f7c73279ab39932b3ad021/README.md?code=true#L4
Each decoder (txt2img, img2img & inpaint) therefore is now accompanied by a "Combined" pipeline that will automatically be called from AutoPipelineFor{Text2Img,Img2Img,Inpaint}
.
The following use cases are now supported and thereby make sure Kandinsky models can be used with the same API as other models:
Text 2 Image
#!/usr/bin/env python3 from diffusers import AutoPipelineForText2Image import torch
pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
or pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)
prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"
image = pipe(prompt=prompt, num_inference_steps=25).images[0]
Img2Img
from diffusers import AutoPipelineForImage2Image import torch import requests from io import BytesIO from PIL import Image import os
pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
or pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality"
response = requests.get(url) image = Image.open(BytesIO(response.content)).convert("RGB") image.thumbnail((768, 768))
image = pipe(prompt=prompt, image=original_image,num_inference_steps=25).images[0]
Inpaint
from diffusers import AutoPipelineForInpainting from diffusers.utils import load_image import torch import numpy as np
pipe = AutoPipelineForInpainting.from_pretrained("kandinsky-community/kandinsky-2-1-inpaint", torch_dtype=torch.float16)
or pipe = AutoPipelineForInpainting.from_pretrained("kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16)
pipe.enable_model_cpu_offload()
prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality"
original_image = load_image( "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" "/kandinsky/cat.png" )
mask = np.zeros((768, 768), dtype=np.float32)
Let's mask out an area above the cat's head
mask[:250, 250:-250] = 1
image = pipe(prompt=prompt, image=original_image,mask_image=mask, num_inference_steps=25).images[0]
To achieve this the following pipelines have been added:
KandinskyCombinedPipeline, KandinskyImg2ImgCombinedPipeline, KandinskyInpaintCombinedPipeline, KandinskyV22CombinedPipeline, KandinskyV22Img2ImgCombinedPipeline, KandinskyV22InpaintCombinedPipeline,
Edit: updated mask in inpaint
example.