Pipelines (original) (raw)

The DiffusionPipeline is the quickest way to load any pretrained diffusion pipeline from the Hub for inference.

You shouldn’t use the DiffusionPipeline class for training or finetuning a diffusion model. Individual components (for example, UNet2DModel and UNet2DConditionModel) of diffusion pipelines are usually trained individually, so we suggest directly working with them instead.

The pipeline type (for example StableDiffusionPipeline) of any diffusion pipeline loaded with from_pretrained() is automatically detected and pipeline components are loaded and passed to the __init__ function of the pipeline.

Base class for all pipelines.

DiffusionPipeline stores all components (models, schedulers, and processors) for diffusion pipelines and provides methods for loading, downloading and saving models. It also includes methods to:

Class attributes:

__call__

( *args **kwargs )

Call self as a function.

device

< source >

( ) → torch.device

The torch device on which the pipeline is located.

to

< source >

( *args **kwargs ) → DiffusionPipeline

Parameters

The pipeline converted to specified dtype and/or dtype.

Performs Pipeline dtype and/or device conversion. A torch.dtype and torch.device are inferred from the arguments of self.to(*args, **kwargs).

If the pipeline already has the correct torch.dtype and torch.device, then it is returned as is. Otherwise, the returned pipeline is a copy of self with the desired torch.dtype and torch.device.

Here are the ways to call to:

The self.components property can be useful to run different pipelines with the same weights and configurations without reallocating additional memory.

Returns (dict): A dictionary containing all the modules needed to initialize the pipeline.

Examples:

from diffusers import ( ... StableDiffusionPipeline, ... StableDiffusionImg2ImgPipeline, ... StableDiffusionInpaintPipeline, ... )

text2img = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") img2img = StableDiffusionImg2ImgPipeline(**text2img.components) inpaint = StableDiffusionInpaintPipeline(**text2img.components)

Disable sliced attention computation. If enable_attention_slicing was previously called, attention is computed in one step.

disable_xformers_memory_efficient_attention

< source >

( )

Disable memory efficient attention from xFormers.

download

< source >

( pretrained_model_name **kwargs ) → os.PathLike

Parameters

A path to the downloaded pipeline.

Download and cache a PyTorch diffusion pipeline from pretrained pipeline weights.

To use private or gated models, log-in withhuggingface-cli login.

enable_attention_slicing

< source >

( slice_size: typing.Union[str, int, NoneType] = 'auto' )

Parameters

Enable sliced attention computation. When this option is enabled, the attention module splits the input tensor in slices to compute attention in several steps. For more than one attention head, the computation is performed sequentially over each head. This is useful to save some memory in exchange for a small speed decrease.

⚠️ Don’t enable attention slicing if you’re already using scaled_dot_product_attention (SDPA) from PyTorch 2.0 or xFormers. These attention computations are already very memory efficient so you won’t need to enable this function. If you enable attention slicing with SDPA or xFormers, it can lead to serious slow downs!

Examples:

import torch from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained( ... "runwayml/stable-diffusion-v1-5", ... torch_dtype=torch.float16, ... use_safetensors=True, ... )

prompt = "a photo of an astronaut riding a horse on mars" pipe.enable_attention_slicing() image = pipe(prompt).images[0]

enable_model_cpu_offload

< source >

( gpu_id: typing.Optional[int] = None device: typing.Union[torch.device, str] = 'cuda' )

Parameters

Offloads all models to CPU using accelerate, reducing memory usage with a low impact on performance. Compared to enable_sequential_cpu_offload, this method moves one whole model at a time to the GPU when its forwardmethod is called, and the model remains in GPU until the next model runs. Memory savings are lower than withenable_sequential_cpu_offload, but performance is much better due to the iterative execution of the unet.

enable_sequential_cpu_offload

< source >

( gpu_id: typing.Optional[int] = None device: typing.Union[torch.device, str] = 'cuda' )

Parameters

Offloads all models to CPU using 🤗 Accelerate, significantly reducing memory usage. When called, the state dicts of all torch.nn.Module components (except those in self._exclude_from_cpu_offload) are saved to CPU and then moved to torch.device('meta') and loaded to GPU only when their specific submodule has its forwardmethod called. Offloading happens on a submodule basis. Memory savings are higher than withenable_model_cpu_offload, but performance is lower.

enable_xformers_memory_efficient_attention

< source >

( attention_op: typing.Optional[typing.Callable] = None )

Parameters

Enable memory efficient attention from xFormers. When this option is enabled, you should observe lower GPU memory usage and a potential speed up during inference. Speed up during training is not guaranteed.

⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient attention takes precedent.

Examples:

import torch from diffusers import DiffusionPipeline from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1", torch_dtype=torch.float16) pipe = pipe.to("cuda") pipe.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)

pipe.vae.enable_xformers_memory_efficient_attention(attention_op=None)

from_pretrained

< source >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike, NoneType] **kwargs )

Parameters

Instantiate a PyTorch diffusion pipeline from pretrained pipeline weights.

The pipeline is set in evaluation mode (model.eval()) by default.

If you get the error message below, you need to finetune the weights for your downstream task:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at runwayml/stable-diffusion-v1-5 and are newly initialized because the shapes did not match:

To use private or gated models, log-in withhuggingface-cli login.

Examples:

from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("CompVis/ldm-text2im-large-256")

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

from diffusers import LMSDiscreteScheduler

scheduler = LMSDiscreteScheduler.from_config(pipeline.scheduler.config) pipeline.scheduler = scheduler

Function that offloads all components, removes all model hooks that were added when usingenable_model_cpu_offload and then applies them again. In case the model has not been offloaded this function is a no-op. Make sure to add this function to the end of the __call__ function of your pipeline so that it functions correctly when applying enable_model_cpu_offload.

Convert a NumPy image or a batch of images to a PIL image.

save_pretrained

< source >

( save_directory: typing.Union[str, os.PathLike] safe_serialization: bool = True variant: typing.Optional[str] = None push_to_hub: bool = False **kwargs )

Parameters

Save all saveable variables of the pipeline to a directory. A pipeline variable can be saved and loaded if its class implements both a save and loading method. The pipeline is easily reloaded using thefrom_pretrained() class method.