Kandinsky 3 (original) (raw)

Kandinsky 3 is created by Vladimir Arkhipkin,Anastasia Maltseva,Igor Pavlov,Andrei Filatov,Arseniy Shakhmatov,Andrey Kuznetsov,Denis Dimitrov, Zein Shaheen

The description from it’s GitHub page:

Kandinsky 3.0 is an open-source text-to-image diffusion model built upon the Kandinsky2-x model family. In comparison to its predecessors, enhancements have been made to the text understanding and visual quality of the model, achieved by increasing the size of the text encoder and Diffusion U-Net models, respectively.

Its architecture includes 3 main components:

  1. FLAN-UL2, which is an encoder decoder model based on the T5 architecture.
  2. New U-Net architecture featuring BigGAN-deep blocks doubles depth while maintaining the same number of parameters.
  3. Sber-MoVQGAN is a decoder proven to have superior results in image restoration.

The original codebase can be found at ai-forever/Kandinsky-3.

Check out the Kandinsky Community organization on the Hub for the official model checkpoints for tasks like text-to-image, image-to-image, and inpainting.

Make sure to check out the schedulers guide to learn how to explore the tradeoff between scheduler speed and quality, and see the reuse components across pipelines section to learn how to efficiently load the same components into multiple pipelines.

Kandinsky3Pipeline

class diffusers.Kandinsky3Pipeline

< source >

( tokenizer: T5Tokenizer text_encoder: T5EncoderModel unet: Kandinsky3UNet scheduler: DDPMScheduler movq: VQModel )

__call__

< source >

( prompt: Union = None num_inference_steps: int = 25 guidance_scale: float = 3.0 negative_prompt: Union = None num_images_per_prompt: Optional = 1 height: Optional = 1024 width: Optional = 1024 generator: Union = None prompt_embeds: Optional = None negative_prompt_embeds: Optional = None attention_mask: Optional = None negative_attention_mask: Optional = None output_type: Optional = 'pil' return_dict: bool = True latents = None callback_on_step_end: Optional = None callback_on_step_end_tensor_inputs: List = ['latents'] **kwargs ) → ImagePipelineOutput or tuple

Parameters

Function invoked when calling the pipeline for generation.

Examples:

from diffusers import AutoPipelineForText2Image import torch

pipe = AutoPipelineForText2Image.from_pretrained( ... "kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16 ... ) pipe.enable_model_cpu_offload()

prompt = "A photograph of the inside of a subway train. There are raccoons sitting on the seats. One of them is reading a newspaper. The window shows the city in the background."

generator = torch.Generator(device="cpu").manual_seed(0) image = pipe(prompt, num_inference_steps=25, generator=generator).images[0]

encode_prompt

< source >

( prompt do_classifier_free_guidance = True num_images_per_prompt = 1 device = None negative_prompt = None prompt_embeds: Optional = None negative_prompt_embeds: Optional = None _cut_context = False attention_mask: Optional = None negative_attention_mask: Optional = None )

Parameters

Encodes the prompt into text encoder hidden states.

Kandinsky3Img2ImgPipeline

class diffusers.Kandinsky3Img2ImgPipeline

< source >

( tokenizer: T5Tokenizer text_encoder: T5EncoderModel unet: Kandinsky3UNet scheduler: DDPMScheduler movq: VQModel )

__call__

< source >

( prompt: Union = None image: Union = None strength: float = 0.3 num_inference_steps: int = 25 guidance_scale: float = 3.0 negative_prompt: Union = None num_images_per_prompt: Optional = 1 generator: Union = None prompt_embeds: Optional = None negative_prompt_embeds: Optional = None attention_mask: Optional = None negative_attention_mask: Optional = None output_type: Optional = 'pil' return_dict: bool = True callback_on_step_end: Optional = None callback_on_step_end_tensor_inputs: List = ['latents'] **kwargs ) → ImagePipelineOutput or tuple

Parameters

Function invoked when calling the pipeline for generation.

Examples:

from diffusers import AutoPipelineForImage2Image from diffusers.utils import load_image import torch

pipe = AutoPipelineForImage2Image.from_pretrained( ... "kandinsky-community/kandinsky-3", variant="fp16", torch_dtype=torch.float16 ... ) pipe.enable_model_cpu_offload()

prompt = "A painting of the inside of a subway train with tiny raccoons." image = load_image( ... "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinsky3/t2i.png" ... )

generator = torch.Generator(device="cpu").manual_seed(0) image = pipe(prompt, image=image, strength=0.75, num_inference_steps=25, generator=generator).images[0]

encode_prompt

< source >

( prompt do_classifier_free_guidance = True num_images_per_prompt = 1 device = None negative_prompt = None prompt_embeds: Optional = None negative_prompt_embeds: Optional = None _cut_context = False attention_mask: Optional = None negative_attention_mask: Optional = None )

Parameters

Encodes the prompt into text encoder hidden states.

device: (torch.device, optional): torch device to place the resulting embeddings on num_images_per_prompt (int, optional, defaults to 1): number of images that should be generated per prompt do_classifier_free_guidance (bool, optional, defaults to True): whether to use classifier free guidance or not negative_prompt (str or List[str], optional): The prompt or prompts not to guide the image generation. If not defined, one has to passnegative_prompt_embeds. instead. If not defined, one has to pass negative_prompt_embeds. instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than 1). prompt_embeds (torch.Tensor, optional): Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument. negative_prompt_embeds (torch.Tensor, optional): Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument. attention_mask (torch.Tensor, optional): Pre-generated attention mask. Must provide if passing prompt_embeds directly. negative_attention_mask (torch.Tensor, optional): Pre-generated negative attention mask. Must provide if passing negative_prompt_embeds directly.

< > Update on GitHub