Models (original) (raw)

🤗 Diffusers provides pretrained models for popular algorithms and modules to create custom diffusion systems. The primary function of models is to denoise an input sample as modeled by the distribution pθ(xt−1∣xt)p_{\theta}(x_{t-1}|x_{t}).

All models are built from the base ModelMixin class which is a torch.nn.Module providing basic functionality for saving and loading models, locally and from the Hugging Face Hub.

Base class for all models.

ModelMixin takes care of storing the model configuration and provides methods for loading, downloading and saving models.

Potentially dequantize the model in case it has been quantized by a quantization method that support dequantization.

Deactivates gradient checkpointing for the current model (may be referred to as activation checkpointing or_checkpoint activations_ in other frameworks).

disable npu flash attention from torch_npu

disable_xformers_memory_efficient_attention

< source >

( )

Disable memory efficient attention from xFormers.

Disable the flash attention pallals kernel for torch_xla.

enable_gradient_checkpointing

< source >

( gradient_checkpointing_func: typing.Optional[typing.Callable] = None )

Parameters

Activates gradient checkpointing for the current model (may be referred to as activation checkpointing or_checkpoint activations_ in other frameworks).

enable_group_offload

< source >

( onload_device: device offload_device: device = device(type='cpu') offload_type: str = 'block_level' num_blocks_per_group: typing.Optional[int] = None non_blocking: bool = False use_stream: bool = False record_stream: bool = False low_cpu_mem_usage = False )

Activates group offloading for the current model.

See apply_group_offloading() for more information.

Example:

from diffusers import CogVideoXTransformer3DModel

transformer = CogVideoXTransformer3DModel.from_pretrained( ... "THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16 ... )

transformer.enable_group_offload( ... onload_device=torch.device("cuda"), ... offload_device=torch.device("cpu"), ... offload_type="leaf_level", ... use_stream=True, ... )

enable_layerwise_casting

< source >

( storage_dtype: dtype = torch.float8_e4m3fn compute_dtype: typing.Optional[torch.dtype] = None skip_modules_pattern: typing.Optional[typing.Tuple[str, ...]] = None skip_modules_classes: typing.Optional[typing.Tuple[typing.Type[torch.nn.modules.module.Module], ...]] = None non_blocking: bool = False )

Parameters

Activates layerwise casting for the current model.

Layerwise casting is a technique that casts the model weights to a lower precision dtype for storage but upcasts them on-the-fly to a higher precision dtype for computation. This process can significantly reduce the memory footprint from model weights, but may lead to some quality degradation in the outputs. Most degradations are negligible, mostly stemming from weight casting in normalization and modulation layers.

By default, most models in diffusers set the _skip_layerwise_casting_patterns attribute to ignore patch embedding, positional embedding and normalization layers. This is because these layers are most likely precision-critical for quality. If you wish to change this behavior, you can set the_skip_layerwise_casting_patterns attribute to None, or callapply_layerwise_casting() with custom arguments.

Example:

Using enable_layerwise_casting():

from diffusers import CogVideoXTransformer3DModel

transformer = CogVideoXTransformer3DModel.from_pretrained( ... "THUDM/CogVideoX-5b", subfolder="transformer", torch_dtype=torch.bfloat16 ... )

transformer.enable_layerwise_casting(storage_dtype=torch.float8_e4m3fn, compute_dtype=torch.bfloat16)

Enable npu flash attention from torch_npu

enable_xformers_memory_efficient_attention

< source >

( attention_op: typing.Optional[typing.Callable] = None )

Parameters

Enable memory efficient attention from xFormers.

When this option is enabled, you should observe lower GPU memory usage and a potential speed up during inference. Speed up during training is not guaranteed.

⚠️ When memory efficient attention and sliced attention are both enabled, memory efficient attention takes precedent.

Examples:

import torch from diffusers import UNet2DConditionModel from xformers.ops import MemoryEfficientAttentionFlashAttentionOp

model = UNet2DConditionModel.from_pretrained( ... "stabilityai/stable-diffusion-2-1", subfolder="unet", torch_dtype=torch.float16 ... ) model = model.to("cuda") model.enable_xformers_memory_efficient_attention(attention_op=MemoryEfficientAttentionFlashAttentionOp)

enable_xla_flash_attention

< source >

( partition_spec: typing.Optional[typing.Callable] = None **kwargs )

Enable the flash attention pallals kernel for torch_xla.

from_pretrained

< source >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike, NoneType] **kwargs )

Parameters

Instantiate a pretrained PyTorch model from a pretrained model configuration.

The model is set in evaluation mode - model.eval() - by default, and dropout modules are deactivated. To train the model, set it back in training mode with model.train().

To use private or gated models, log-in withhuggingface-cli login. You can also activate the special“offline-mode” to use this method in a firewalled environment.

Example:

from diffusers import UNet2DConditionModel

unet = UNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5", subfolder="unet")

If you get the error message below, you need to finetune the weights for your downstream task:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at runwayml/stable-diffusion-v1-5 and are newly initialized because the shapes did not match:

get_memory_footprint

< source >

( return_buffers = True )

Parameters

Get the memory footprint of a model. This will return the memory footprint of the current model in bytes. Useful to benchmark the memory footprint of the current model and design some tests. Solution inspired from the PyTorch discussions: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2

num_parameters

< source >

( only_trainable: bool = False exclude_embeddings: bool = False ) → int

Parameters

The number of parameters.

Get number of (trainable or non-embedding) parameters in the module.

Example:

from diffusers import UNet2DConditionModel

model_id = "runwayml/stable-diffusion-v1-5" unet = UNet2DConditionModel.from_pretrained(model_id, subfolder="unet") unet.num_parameters(only_trainable=True) 859520964

save_pretrained

< source >

( save_directory: typing.Union[str, os.PathLike] is_main_process: bool = True save_function: typing.Optional[typing.Callable] = None safe_serialization: bool = True variant: typing.Optional[str] = None max_shard_size: typing.Union[int, str] = '10GB' push_to_hub: bool = False **kwargs )

Parameters

Save a model and its configuration file to a directory so that it can be reloaded using thefrom_pretrained() class method.

set_use_npu_flash_attention

< source >

( valid: bool )

Set the switch for the npu flash attention.

Base class for all Flax models.

FlaxModelMixin takes care of storing the model configuration and provides methods for loading, downloading and saving models.

from_pretrained

< source >

( pretrained_model_name_or_path: typing.Union[str, os.PathLike] dtype: dtype = <class 'jax.numpy.float32'> *model_args **kwargs )

Parameters

Instantiate a pretrained Flax model from a pretrained model configuration.

Examples:

from diffusers import FlaxUNet2DConditionModel

model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5")

model, params = FlaxUNet2DConditionModel.from_pretrained("./test/saved_model/")

If you get the error message below, you need to finetune the weights for your downstream task:

Some weights of UNet2DConditionModel were not initialized from the model checkpoint at runwayml/stable-diffusion-v1-5 and are newly initialized because the shapes did not match:

save_pretrained

< source >

( save_directory: typing.Union[str, os.PathLike] params: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict] is_main_process: bool = True push_to_hub: bool = False **kwargs )

Parameters

Save a model and its configuration file to a directory so that it can be reloaded using thefrom_pretrained() class method.

to_bf16

< source >

( params: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict] mask: typing.Any = None )

Parameters

Cast the floating-point params to jax.numpy.bfloat16. This returns a new params tree and does not cast the params in place.

This method can be used on a TPU to explicitly convert the model parameters to bfloat16 precision to do full half-precision training or to save weights in bfloat16 for inference in order to save memory and improve speed.

Examples:

from diffusers import FlaxUNet2DConditionModel

model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5")

params = model.to_bf16(params)

from flax import traverse_util

model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5") flat_params = traverse_util.flatten_dict(params) mask = { ... path: (path[-2] != ("LayerNorm", "bias") and path[-2:] != ("LayerNorm", "scale")) ... for path in flat_params ... } mask = traverse_util.unflatten_dict(mask) params = model.to_bf16(params, mask)

to_fp16

< source >

( params: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict] mask: typing.Any = None )

Parameters

Cast the floating-point params to jax.numpy.float16. This returns a new params tree and does not cast theparams in place.

This method can be used on a GPU to explicitly convert the model parameters to float16 precision to do full half-precision training or to save weights in float16 for inference in order to save memory and improve speed.

Examples:

from diffusers import FlaxUNet2DConditionModel

model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5")

params = model.to_fp16(params)

from flax import traverse_util

model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5") flat_params = traverse_util.flatten_dict(params) mask = { ... path: (path[-2] != ("LayerNorm", "bias") and path[-2:] != ("LayerNorm", "scale")) ... for path in flat_params ... } mask = traverse_util.unflatten_dict(mask) params = model.to_fp16(params, mask)

to_fp32

< source >

( params: typing.Union[typing.Dict, flax.core.frozen_dict.FrozenDict] mask: typing.Any = None )

Parameters

Cast the floating-point params to jax.numpy.float32. This method can be used to explicitly convert the model parameters to fp32 precision. This returns a new params tree and does not cast the params in place.

Examples:

from diffusers import FlaxUNet2DConditionModel

model, params = FlaxUNet2DConditionModel.from_pretrained("runwayml/stable-diffusion-v1-5")

params = model.to_f16(params)

params = model.to_fp32(params)

class diffusers.utils.PushToHubMixin

< source >

( )

A Mixin to push a model, scheduler, or pipeline to the Hugging Face Hub.

push_to_hub

< source >

( repo_id: str commit_message: typing.Optional[str] = None private: typing.Optional[bool] = None token: typing.Optional[str] = None create_pr: bool = False safe_serialization: bool = True variant: typing.Optional[str] = None )

Parameters

Upload model, scheduler, or pipeline files to the 🤗 Hugging Face Hub.

Examples:

from diffusers import UNet2DConditionModel

unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="unet")

unet.push_to_hub("my-finetuned-unet")

unet.push_to_hub("your-org/my-finetuned-unet")