diffusers (original) (raw)

What does this PR do?

🚨🚨🚨 1. Breaking change - fixes mask input 🚨🚨🚨

NOW: mask_image repaints white pixels and preserves black pixels

Kandinksy was using an incorrect mask format. Instead of using white pixels as a mask (like SD & IF do), Kandinsky models were using black pixels. This needs to be corrected and so that the diffusers API is aligned. We cannot have different mask formats for different pipelines.

Important => This means that everyone that already used Kandinsky Inpaint in production / pipeline now needs to change the mask to:

For PIL input

import PIL.ImageOps mask = PIL.ImageOps.invert(mask)

For PyTorch and Numpy input

mask = 1 - mask

Once this PR is merged we also need to correct all the model cards (cc @yiyixuxu)

2. Adds combined pipelines

As noticed in #4161 by @vladmandic , diffusers currently has an inconsistent design between Kandinsky Pipelines and other pipelines. The reason for this is that all Kandinsky pipelines (txt2img, img2img & inpaint) are based on Dalle2's UnCLIP design meaning they have to run two diffusion pipelines:

prior which diffuses text embeddings to image embeddings which is the same for all t2i, img2img, inpaint)
decoder which diffuses image embeddings to images (each t2i, img2img, inpaint have different pipelines)

Running just the prior or the decoder on its own often makes no sense so we should give the user an easier UX here while making sure we still keep the pipelines separated so that they can be run independently (e.g. on different nodes).

This PR introduces a mechanism that allows to load required prior pipelines directly when loading a decoder pipeline and puts all components in a single "combined" pipeline. Required pipelines are defined in the decoder's model card here: https://huggingface.co/kandinsky-community/kandinsky-2-1/blob/73bf6fba5b4410c671f7c73279ab39932b3ad021/README.md?code=true#L4

Each decoder (txt2img, img2img & inpaint) therefore is now accompanied by a "Combined" pipeline that will automatically be called from AutoPipelineFor{Text2Img,Img2Img,Inpaint}.

The following use cases are now supported and thereby make sure Kandinsky models can be used with the same API as other models:

Text 2 Image

#!/usr/bin/env python3 from diffusers import AutoPipelineForText2Image import torch

pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)

or pipe = AutoPipelineForText2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)

prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"

image = pipe(prompt=prompt, num_inference_steps=25).images[0]

Img2Img

from diffusers import AutoPipelineForImage2Image import torch import requests from io import BytesIO from PIL import Image import os

pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)

or pipe = AutoPipelineForImage2Image.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)

pipe.enable_model_cpu_offload()

prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality"

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url) image = Image.open(BytesIO(response.content)).convert("RGB") image.thumbnail((768, 768))

image = pipe(prompt=prompt, image=original_image,num_inference_steps=25).images[0]

Inpaint

from diffusers import AutoPipelineForInpainting from diffusers.utils import load_image import torch import numpy as np

pipe = AutoPipelineForInpainting.from_pretrained("kandinsky-community/kandinsky-2-1-inpaint", torch_dtype=torch.float16)

or pipe = AutoPipelineForInpainting.from_pretrained("kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16)

pipe.enable_model_cpu_offload()

prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality"

original_image = load_image( "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" "/kandinsky/cat.png" )

mask = np.zeros((768, 768), dtype=np.float32)

Let's mask out an area above the cat's head

mask[:250, 250:-250] = 1

image = pipe(prompt=prompt, image=original_image,mask_image=mask, num_inference_steps=25).images[0]

To achieve this the following pipelines have been added:

KandinskyCombinedPipeline, KandinskyImg2ImgCombinedPipeline, KandinskyInpaintCombinedPipeline, KandinskyV22CombinedPipeline, KandinskyV22Img2ImgCombinedPipeline, KandinskyV22InpaintCombinedPipeline,

Edit: updated mask in inpaint example.