LEDITS++ (original) (raw)

LEDITS++ was proposed in LEDITS++: Limitless Image Editing using Text-to-Image Models by Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, Apolinário Passos.

The abstract from the paper is:

Text-to-image diffusion models have recently received increasing interest for their astonishing ability to produce high-fidelity images from solely text inputs. Subsequent research efforts aim to exploit and apply their capabilities to real image editing. However, existing image-to-image methods are often inefficient, imprecise, and of limited versatility. They either require time-consuming fine-tuning, deviate unnecessarily strongly from the input image, and/or lack support for multiple, simultaneous edits. To address these issues, we introduce LEDITS++, an efficient yet versatile and precise textual image manipulation technique. LEDITS++‘s novel inversion approach requires no tuning nor optimization and produces high-fidelity results with a few diffusion steps. Second, our methodology supports multiple simultaneous edits and is architecture-agnostic. Third, we use a novel implicit masking technique that limits changes to relevant image regions. We propose the novel TEdBench++ benchmark as part of our exhaustive evaluation. Our results demonstrate the capabilities of LEDITS++ and its improvements over previous methods. The project page is available at https://leditsplusplus-project.static.hf.space .

You can find additional information about LEDITS++ on the project page and try it out in a demo.

Due to some backward compatability issues with the current diffusers implementation of [DPMSolverMultistepScheduler](/docs/diffusers/v0.31.0/en/api/schedulers/multistep_dpm_solver#diffusers.DPMSolverMultistepScheduler) this implementation of LEdits++ can no longer guarantee perfect inversion. This issue is unlikely to have any noticeable effects on applied use-cases. However, we provide an alternative implementation that guarantees perfect inversion in a dedicated [GitHub repo](https://github.com/ml-research/ledits\_pp).

We provide two distinct pipelines based on different pre-trained models.

LEditsPPPipelineStableDiffusion

class diffusers.LEditsPPPipelineStableDiffusion

< source >

( vae: AutoencoderKL text_encoder: CLIPTextModel tokenizer: CLIPTokenizer unet: UNet2DConditionModel scheduler: Union safety_checker: StableDiffusionSafetyChecker feature_extractor: CLIPImageProcessor requires_safety_checker: bool = True )

Parameters

Pipeline for textual image editing using LEDits++ with Stable Diffusion.

This model inherits from DiffusionPipeline and builds on the StableDiffusionPipeline. Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).

__call__

< source >

( negative_prompt: Union = None generator: Union = None output_type: Optional = 'pil' return_dict: bool = True editing_prompt: Union = None editing_prompt_embeds: Optional = None negative_prompt_embeds: Optional = None reverse_editing_direction: Union = False edit_guidance_scale: Union = 5 edit_warmup_steps: Union = 0 edit_cooldown_steps: Union = None edit_threshold: Union = 0.9 user_mask: Optional = None sem_guidance: Optional = None use_cross_attn_mask: bool = False use_intersect_mask: bool = True attn_store_steps: Optional = [] store_averaged_over_steps: bool = True cross_attention_kwargs: Optional = None guidance_rescale: float = 0.0 clip_skip: Optional = None callback_on_step_end: Optional = None callback_on_step_end_tensor_inputs: List = ['latents'] **kwargs ) → LEditsPPDiffusionPipelineOutput or tuple

Parameters

LEditsPPDiffusionPipelineOutput if return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the generated images, and the second element is a list of bools denoting whether the corresponding generated image likely represents "not-safe-for-work" (nsfw) content, according to the safety_checker`.

The call function to the pipeline for editing. Theinvert() method has to be called beforehand. Edits will always be performed for the last inverted image(s).

Examples:

import PIL import requests import torch from io import BytesIO

from diffusers import LEditsPPPipelineStableDiffusion from diffusers.utils import load_image

pipe = LEditsPPPipelineStableDiffusion.from_pretrained( ... "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16 ... ) pipe = pipe.to("cuda")

img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/cherry_blossom.png" image = load_image(img_url).convert("RGB")

_ = pipe.invert(image=image, num_inversion_steps=50, skip=0.1)

edited_image = pipe( ... editing_prompt=["cherry blossom"], edit_guidance_scale=10.0, edit_threshold=0.75 ... ).images[0]

invert

< source >

( image: Union source_prompt: str = '' source_guidance_scale: float = 3.5 num_inversion_steps: int = 30 skip: float = 0.15 generator: Optional = None cross_attention_kwargs: Optional = None clip_skip: Optional = None height: Optional = None width: Optional = None resize_mode: Optional = 'default' crops_coords: Optional = None ) → LEditsPPInversionPipelineOutput

Parameters

Output will contain the resized input image(s) and respective VAE reconstruction(s).

The function to the pipeline for image inversion as described by the LEDITS++ Paper. If the scheduler is set to DDIMScheduler the inversion proposed by edit-friendly DPDM will be performed instead.

encode_prompt

< source >

( device num_images_per_prompt enable_edit_guidance negative_prompt = None editing_prompt = None negative_prompt_embeds: Optional = None editing_prompt_embeds: Optional = None lora_scale: Optional = None clip_skip: Optional = None )

Parameters

Encodes the prompt into text encoder hidden states.

LEditsPPPipelineStableDiffusionXL

class diffusers.LEditsPPPipelineStableDiffusionXL

< source >

( vae: AutoencoderKL text_encoder: CLIPTextModel text_encoder_2: CLIPTextModelWithProjection tokenizer: CLIPTokenizer tokenizer_2: CLIPTokenizer unet: UNet2DConditionModel scheduler: Union image_encoder: CLIPVisionModelWithProjection = None feature_extractor: CLIPImageProcessor = None force_zeros_for_empty_prompt: bool = True add_watermarker: Optional = None )

Parameters

Pipeline for textual image editing using LEDits++ with Stable Diffusion XL.

This model inherits from DiffusionPipeline and builds on the StableDiffusionXLPipeline. Check the superclass documentation for the generic methods implemented for all pipelines (downloading, saving, running on a particular device, etc.).

In addition the pipeline inherits the following loading methods:

as well as the following saving methods:

__call__

< source >

( denoising_end: Optional = None negative_prompt: Union = None negative_prompt_2: Union = None negative_prompt_embeds: Optional = None negative_pooled_prompt_embeds: Optional = None ip_adapter_image: Union = None output_type: Optional = 'pil' return_dict: bool = True cross_attention_kwargs: Optional = None guidance_rescale: float = 0.0 crops_coords_top_left: Tuple = (0, 0) target_size: Optional = None editing_prompt: Union = None editing_prompt_embeddings: Optional = None editing_pooled_prompt_embeds: Optional = None reverse_editing_direction: Union = False edit_guidance_scale: Union = 5 edit_warmup_steps: Union = 0 edit_cooldown_steps: Union = None edit_threshold: Union = 0.9 sem_guidance: Optional = None use_cross_attn_mask: bool = False use_intersect_mask: bool = False user_mask: Optional = None attn_store_steps: Optional = [] store_averaged_over_steps: bool = True clip_skip: Optional = None callback_on_step_end: Optional = None callback_on_step_end_tensor_inputs: List = ['latents'] **kwargs ) → LEditsPPDiffusionPipelineOutput or tuple

Parameters

LEditsPPDiffusionPipelineOutput if return_dict is True, otherwise a `tuple. When returning a tuple, the first element is a list with the generated images.

The call function to the pipeline for editing. Theinvert() method has to be called beforehand. Edits will always be performed for the last inverted image(s).

Examples:

import torch import PIL import requests from io import BytesIO

from diffusers import LEditsPPPipelineStableDiffusionXL

pipe = LEditsPPPipelineStableDiffusionXL.from_pretrained( ... "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16 ... ) pipe = pipe.to("cuda")

def download_image(url): ... response = requests.get(url) ... return PIL.Image.open(BytesIO(response.content)).convert("RGB")

img_url = "https://www.aiml.informatik.tu-darmstadt.de/people/mbrack/tennis.jpg" image = download_image(img_url)

_ = pipe.invert(image=image, num_inversion_steps=50, skip=0.2)

edited_image = pipe( ... editing_prompt=["tennis ball", "tomato"], ... reverse_editing_direction=[True, False], ... edit_guidance_scale=[5.0, 10.0], ... edit_threshold=[0.9, 0.85], ... ).images[0]

invert

< source >

( image: Union source_prompt: str = '' source_guidance_scale = 3.5 negative_prompt: str = None negative_prompt_2: str = None num_inversion_steps: int = 50 skip: float = 0.15 generator: Optional = None crops_coords_top_left: Tuple = (0, 0) num_zero_noise_steps: int = 3 cross_attention_kwargs: Optional = None ) → LEditsPPInversionPipelineOutput

Parameters

Output will contain the resized input image(s) and respective VAE reconstruction(s).

The function to the pipeline for image inversion as described by the LEDITS++ Paper. If the scheduler is set to DDIMScheduler the inversion proposed by edit-friendly DPDM will be performed instead.

encode_prompt

< source >

( device: Optional = None num_images_per_prompt: int = 1 negative_prompt: Optional = None negative_prompt_2: Optional = None negative_prompt_embeds: Optional = None negative_pooled_prompt_embeds: Optional = None lora_scale: Optional = None clip_skip: Optional = None enable_edit_guidance: bool = True editing_prompt: Optional = None editing_prompt_embeds: Optional = None editing_pooled_prompt_embeds: Optional = None )

Parameters

Encodes the prompt into text encoder hidden states.

get_guidance_scale_embedding

< source >

( w: Tensor embedding_dim: int = 512 dtype: dtype = torch.float32 ) → torch.Tensor

Parameters

Embedding vectors with shape (len(w), embedding_dim).

See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298

LEditsPPDiffusionPipelineOutput

class diffusers.pipelines.LEditsPPDiffusionPipelineOutput

< source >

( images: Union nsfw_content_detected: Optional )

Parameters

Output class for LEdits++ Diffusion pipelines.

LEditsPPInversionPipelineOutput

class diffusers.pipelines.LEditsPPInversionPipelineOutput

< source >

( images: Union vae_reconstruction_images: Union )

Parameters

Output class for LEdits++ Diffusion pipelines.

< > Update on GitHub