Kandinsky 2.2 (original) (raw)

Kandinsky 2.2 is created by Arseniy Shakhmatov, Anton Razzhigaev, Aleksandr Nikolich, Vladimir Arkhipkin, Igor Pavlov, Andrey Kuznetsov, and Denis Dimitrov.

The description from it’s GitHub page is:

Kandinsky 2.2 brings substantial improvements upon its predecessor, Kandinsky 2.1, by introducing a new, more powerful image encoder - CLIP-ViT-G and the ControlNet support. The switch to CLIP-ViT-G as the image encoder significantly increases the model’s capability to generate more aesthetic pictures and better understand text, thus enhancing the model’s overall performance. The addition of the ControlNet mechanism allows the model to effectively control the process of generating images. This leads to more accurate and visually appealing outputs and opens new possibilities for text-guided image manipulation.

The original codebase can be found at ai-forever/Kandinsky-2.

Check out the Kandinsky Community organization on the Hub for the official model checkpoints for tasks like text-to-image, image-to-image, and inpainting.

Make sure to check out the schedulers guide to learn how to explore the tradeoff between scheduler speed and quality, and see the reuse components across pipelines section to learn how to efficiently load the same components into multiple pipelines.

KandinskyV22PriorPipeline

class diffusers.KandinskyV22PriorPipeline

< source >

( prior: PriorTransformer image_encoder: CLIPVisionModelWithProjection text_encoder: CLIPTextModelWithProjection tokenizer: CLIPTokenizer scheduler: UnCLIPScheduler image_processor: CLIPImageProcessor )

Parameters

Pipeline for generating image prior for Kandinsky

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( prompt: typing.Union[str, typing.List[str]] negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: int = 1 num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None guidance_scale: float = 4.0 output_type: typing.Optional[str] = 'pt' return_dict: bool = True callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] ) → KandinskyPriorPipelineOutput or tuple

Parameters

Returns

KandinskyPriorPipelineOutput or tuple

Function invoked when calling the pipeline for generation.

Examples:

from diffusers import KandinskyV22Pipeline, KandinskyV22PriorPipeline import torch

pipe_prior = KandinskyV22PriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior") pipe_prior.to("cuda") prompt = "red cat, 4k photo" image_emb, negative_image_emb = pipe_prior(prompt).to_tuple()

pipe = KandinskyV22Pipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder") pipe.to("cuda") image = pipe( ... image_embeds=image_emb, ... negative_image_embeds=negative_image_emb, ... height=768, ... width=768, ... num_inference_steps=50, ... ).images image[0].save("cat.png")

interpolate

< source >

( images_and_prompts: typing.List[typing.Union[str, PIL.Image.Image, torch.Tensor]] weights: typing.List[float] num_images_per_prompt: int = 1 num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None negative_prior_prompt: typing.Optional[str] = None negative_prompt: str = '' guidance_scale: float = 4.0 device = None ) → KandinskyPriorPipelineOutput or tuple

Parameters

Returns

KandinskyPriorPipelineOutput or tuple

Function invoked when using the prior pipeline for interpolation.

Examples:

from diffusers import KandinskyV22PriorPipeline, KandinskyV22Pipeline from diffusers.utils import load_image import PIL import torch from torchvision import transforms

pipe_prior = KandinskyV22PriorPipeline.from_pretrained( ... "kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16 ... ) pipe_prior.to("cuda") img1 = load_image( ... "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" ... "/kandinsky/cat.png" ... ) img2 = load_image( ... "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" ... "/kandinsky/starry_night.jpeg" ... ) images_texts = ["a cat", img1, img2] weights = [0.3, 0.3, 0.4] out = pipe_prior.interpolate(images_texts, weights) pipe = KandinskyV22Pipeline.from_pretrained( ... "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16 ... ) pipe.to("cuda") image = pipe( ... image_embeds=out.image_embeds, ... negative_image_embeds=out.negative_image_embeds, ... height=768, ... width=768, ... num_inference_steps=50, ... ).images[0] image.save("starry_cat.png")

KandinskyV22Pipeline

class diffusers.KandinskyV22Pipeline

< source >

( unet: UNet2DConditionModel scheduler: DDPMScheduler movq: VQModel )

Parameters

Pipeline for text-to-image generation using Kandinsky

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] negative_image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] height: int = 512 width: int = 512 num_inference_steps: int = 100 guidance_scale: float = 4.0 num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → ImagePipelineOutput or tuple

Parameters

Function invoked when calling the pipeline for generation.

Examples:

from diffusers import KandinskyV22Pipeline, KandinskyV22PriorPipeline import torch

pipe_prior = KandinskyV22PriorPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior") pipe_prior.to("cuda") prompt = "red cat, 4k photo" out = pipe_prior(prompt) image_emb = out.image_embeds zero_image_emb = out.negative_image_embeds pipe = KandinskyV22Pipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder") pipe.to("cuda") image = pipe( ... image_embeds=image_emb, ... negative_image_embeds=zero_image_emb, ... height=768, ... width=768, ... num_inference_steps=50, ... ).images image[0].save("cat.png")

KandinskyV22CombinedPipeline

class diffusers.KandinskyV22CombinedPipeline

< source >

( unet: UNet2DConditionModel scheduler: DDPMScheduler movq: VQModel prior_prior: PriorTransformer prior_image_encoder: CLIPVisionModelWithProjection prior_text_encoder: CLIPTextModelWithProjection prior_tokenizer: CLIPTokenizer prior_scheduler: UnCLIPScheduler prior_image_processor: CLIPImageProcessor )

Parameters

Combined Pipeline for text-to-image generation using Kandinsky

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( prompt: typing.Union[str, typing.List[str]] negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_inference_steps: int = 100 guidance_scale: float = 4.0 num_images_per_prompt: int = 1 height: int = 512 width: int = 512 prior_guidance_scale: float = 4.0 prior_num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 return_dict: bool = True prior_callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None prior_callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] ) → ImagePipelineOutput or tuple

Parameters

Function invoked when calling the pipeline for generation.

Examples:

from diffusers import AutoPipelineForText2Image import torch

pipe = AutoPipelineForText2Image.from_pretrained( "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16 ) pipe.enable_model_cpu_offload()

prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k"

image = pipe(prompt=prompt, num_inference_steps=25).images[0]

enable_sequential_cpu_offload

< source >

( gpu_id: typing.Optional[int] = None device: typing.Union[torch.device, str] = 'cuda' )

Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, unet, text_encoder, vae and safety checker have their state dicts saved to CPU and then are moved to atorch.device('meta') and loaded to GPU only when their specific submodule has its forwardmethod called. Note that offloading happens on a submodule basis. Memory savings are higher than withenable_model_cpu_offload`, but performance is lower.

KandinskyV22ControlnetPipeline

class diffusers.KandinskyV22ControlnetPipeline

< source >

( unet: UNet2DConditionModel scheduler: DDPMScheduler movq: VQModel )

Parameters

Pipeline for text-to-image generation using Kandinsky

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] negative_image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] hint: Tensor height: int = 512 width: int = 512 num_inference_steps: int = 100 guidance_scale: float = 4.0 num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 return_dict: bool = True ) → ImagePipelineOutput or tuple

Parameters

Function invoked when calling the pipeline for generation.

Examples:

KandinskyV22PriorEmb2EmbPipeline

class diffusers.KandinskyV22PriorEmb2EmbPipeline

< source >

( prior: PriorTransformer image_encoder: CLIPVisionModelWithProjection text_encoder: CLIPTextModelWithProjection tokenizer: CLIPTokenizer scheduler: UnCLIPScheduler image_processor: CLIPImageProcessor )

Parameters

Pipeline for generating image prior for Kandinsky

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( prompt: typing.Union[str, typing.List[str]] image: typing.Union[torch.Tensor, typing.List[torch.Tensor], PIL.Image.Image, typing.List[PIL.Image.Image]] strength: float = 0.3 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: int = 1 num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None guidance_scale: float = 4.0 output_type: typing.Optional[str] = 'pt' return_dict: bool = True ) → KandinskyPriorPipelineOutput or tuple

Parameters

Returns

KandinskyPriorPipelineOutput or tuple

Function invoked when calling the pipeline for generation.

Examples:

from diffusers import KandinskyV22Pipeline, KandinskyV22PriorEmb2EmbPipeline import torch

pipe_prior = KandinskyPriorPipeline.from_pretrained( ... "kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16 ... ) pipe_prior.to("cuda")

prompt = "red cat, 4k photo" img = load_image( ... "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" ... "/kandinsky/cat.png" ... ) image_emb, nagative_image_emb = pipe_prior(prompt, image=img, strength=0.2).to_tuple()

pipe = KandinskyPipeline.from_pretrained( ... "kandinsky-community/kandinsky-2-2-decoder, torch_dtype=torch.float16" ... ) pipe.to("cuda")

image = pipe( ... image_embeds=image_emb, ... negative_image_embeds=negative_image_emb, ... height=768, ... width=768, ... num_inference_steps=100, ... ).images

image[0].save("cat.png")

interpolate

< source >

( images_and_prompts: typing.List[typing.Union[str, PIL.Image.Image, torch.Tensor]] weights: typing.List[float] num_images_per_prompt: int = 1 num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None negative_prior_prompt: typing.Optional[str] = None negative_prompt: str = '' guidance_scale: float = 4.0 device = None ) → KandinskyPriorPipelineOutput or tuple

Parameters

Returns

KandinskyPriorPipelineOutput or tuple

Function invoked when using the prior pipeline for interpolation.

Examples:

from diffusers import KandinskyV22PriorEmb2EmbPipeline, KandinskyV22Pipeline from diffusers.utils import load_image import PIL

import torch from torchvision import transforms

pipe_prior = KandinskyV22PriorPipeline.from_pretrained( ... "kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16 ... ) pipe_prior.to("cuda")

img1 = load_image( ... "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" ... "/kandinsky/cat.png" ... )

img2 = load_image( ... "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" ... "/kandinsky/starry_night.jpeg" ... )

images_texts = ["a cat", img1, img2] weights = [0.3, 0.3, 0.4] image_emb, zero_image_emb = pipe_prior.interpolate(images_texts, weights)

pipe = KandinskyV22Pipeline.from_pretrained( ... "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16 ... ) pipe.to("cuda")

image = pipe( ... image_embeds=image_emb, ... negative_image_embeds=zero_image_emb, ... height=768, ... width=768, ... num_inference_steps=150, ... ).images[0]

image.save("starry_cat.png")

KandinskyV22Img2ImgPipeline

class diffusers.KandinskyV22Img2ImgPipeline

< source >

( unet: UNet2DConditionModel scheduler: DDPMScheduler movq: VQModel )

Parameters

Pipeline for image-to-image generation using Kandinsky

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] image: typing.Union[torch.Tensor, PIL.Image.Image, typing.List[torch.Tensor], typing.List[PIL.Image.Image]] negative_image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] height: int = 512 width: int = 512 num_inference_steps: int = 100 guidance_scale: float = 4.0 strength: float = 0.3 num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → ImagePipelineOutput or tuple

Parameters

Function invoked when calling the pipeline for generation.

Examples:

KandinskyV22Img2ImgCombinedPipeline

class diffusers.KandinskyV22Img2ImgCombinedPipeline

< source >

( unet: UNet2DConditionModel scheduler: DDPMScheduler movq: VQModel prior_prior: PriorTransformer prior_image_encoder: CLIPVisionModelWithProjection prior_text_encoder: CLIPTextModelWithProjection prior_tokenizer: CLIPTokenizer prior_scheduler: UnCLIPScheduler prior_image_processor: CLIPImageProcessor )

Parameters

Combined Pipeline for image-to-image generation using Kandinsky

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( prompt: typing.Union[str, typing.List[str]] image: typing.Union[torch.Tensor, PIL.Image.Image, typing.List[torch.Tensor], typing.List[PIL.Image.Image]] negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_inference_steps: int = 100 guidance_scale: float = 4.0 strength: float = 0.3 num_images_per_prompt: int = 1 height: int = 512 width: int = 512 prior_guidance_scale: float = 4.0 prior_num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 return_dict: bool = True prior_callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None prior_callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] ) → ImagePipelineOutput or tuple

Parameters

Function invoked when calling the pipeline for generation.

Examples:

from diffusers import AutoPipelineForImage2Image import torch import requests from io import BytesIO from PIL import Image import os

pipe = AutoPipelineForImage2Image.from_pretrained( "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16 ) pipe.enable_model_cpu_offload()

prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality"

url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg"

response = requests.get(url) image = Image.open(BytesIO(response.content)).convert("RGB") image.thumbnail((768, 768))

image = pipe(prompt=prompt, image=original_image, num_inference_steps=25).images[0]

enable_model_cpu_offload

< source >

( gpu_id: typing.Optional[int] = None device: typing.Union[torch.device, str] = 'cuda' )

Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload, this method moves one whole model at a time to the GPU when its forwardmethod is called, and the model remains in GPU until the next model runs. Memory savings are lower than withenable_sequential_cpu_offload, but performance is much better due to the iterative execution of the unet.

enable_sequential_cpu_offload

< source >

( gpu_id: typing.Optional[int] = None device: typing.Union[torch.device, str] = 'cuda' )

Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, unet, text_encoder, vae and safety checker have their state dicts saved to CPU and then are moved to atorch.device('meta') and loaded to GPU only when their specific submodule has its forwardmethod called. Note that offloading happens on a submodule basis. Memory savings are higher than withenable_model_cpu_offload`, but performance is lower.

KandinskyV22ControlnetImg2ImgPipeline

class diffusers.KandinskyV22ControlnetImg2ImgPipeline

< source >

( unet: UNet2DConditionModel scheduler: DDPMScheduler movq: VQModel )

Parameters

Pipeline for image-to-image generation using Kandinsky

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] image: typing.Union[torch.Tensor, PIL.Image.Image, typing.List[torch.Tensor], typing.List[PIL.Image.Image]] negative_image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] hint: Tensor height: int = 512 width: int = 512 num_inference_steps: int = 100 guidance_scale: float = 4.0 strength: float = 0.3 num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None output_type: typing.Optional[str] = 'pil' callback: typing.Optional[typing.Callable[[int, int, torch.Tensor], NoneType]] = None callback_steps: int = 1 return_dict: bool = True ) → ImagePipelineOutput or tuple

Parameters

Function invoked when calling the pipeline for generation.

Examples:

KandinskyV22InpaintPipeline

class diffusers.KandinskyV22InpaintPipeline

< source >

( unet: UNet2DConditionModel scheduler: DDPMScheduler movq: VQModel )

Parameters

Pipeline for text-guided image inpainting using Kandinsky2.1

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] image: typing.Union[torch.Tensor, PIL.Image.Image] mask_image: typing.Union[torch.Tensor, PIL.Image.Image, numpy.ndarray] negative_image_embeds: typing.Union[torch.Tensor, typing.List[torch.Tensor]] height: int = 512 width: int = 512 num_inference_steps: int = 100 guidance_scale: float = 4.0 num_images_per_prompt: int = 1 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → ImagePipelineOutput or tuple

Parameters

Function invoked when calling the pipeline for generation.

Examples:

KandinskyV22InpaintCombinedPipeline

class diffusers.KandinskyV22InpaintCombinedPipeline

< source >

( unet: UNet2DConditionModel scheduler: DDPMScheduler movq: VQModel prior_prior: PriorTransformer prior_image_encoder: CLIPVisionModelWithProjection prior_text_encoder: CLIPTextModelWithProjection prior_tokenizer: CLIPTokenizer prior_scheduler: UnCLIPScheduler prior_image_processor: CLIPImageProcessor )

Parameters

Combined Pipeline for inpainting generation using Kandinsky

This model inherits from DiffusionPipeline. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)

__call__

< source >

( prompt: typing.Union[str, typing.List[str]] image: typing.Union[torch.Tensor, PIL.Image.Image, typing.List[torch.Tensor], typing.List[PIL.Image.Image]] mask_image: typing.Union[torch.Tensor, PIL.Image.Image, typing.List[torch.Tensor], typing.List[PIL.Image.Image]] negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_inference_steps: int = 100 guidance_scale: float = 4.0 num_images_per_prompt: int = 1 height: int = 512 width: int = 512 prior_guidance_scale: float = 4.0 prior_num_inference_steps: int = 25 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True prior_callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None prior_callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] callback_on_step_end: typing.Optional[typing.Callable[[int, int, typing.Dict], NoneType]] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → ImagePipelineOutput or tuple

Parameters

Function invoked when calling the pipeline for generation.

Examples:

from diffusers import AutoPipelineForInpainting from diffusers.utils import load_image import torch import numpy as np

pipe = AutoPipelineForInpainting.from_pretrained( "kandinsky-community/kandinsky-2-2-decoder-inpaint", torch_dtype=torch.float16 ) pipe.enable_model_cpu_offload()

prompt = "A fantasy landscape, Cinematic lighting" negative_prompt = "low quality, bad quality"

original_image = load_image( "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" "/kandinsky/cat.png" )

mask = np.zeros((768, 768), dtype=np.float32)

mask[:250, 250:-250] = 1

image = pipe(prompt=prompt, image=original_image, mask_image=mask, num_inference_steps=25).images[0]

enable_sequential_cpu_offload

< source >

( gpu_id: typing.Optional[int] = None device: typing.Union[torch.device, str] = 'cuda' )

Offloads all models to CPU using accelerate, significantly reducing memory usage. When called, unet, text_encoder, vae and safety checker have their state dicts saved to CPU and then are moved to atorch.device('meta') and loaded to GPU only when their specific submodule has its forwardmethod called. Note that offloading happens on a submodule basis. Memory savings are higher than withenable_model_cpu_offload`, but performance is lower.

< > Update on GitHub