Models · Hugging Face (original) (raw)

The base class PreTrainedModel implements the common methods for loading/saving a model either from a local file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace’s Hub).

PreTrainedModel also implements a few methods which are common among all the models to:

class transformers.PreTrainedModel

< source >

( config: PreTrainedConfig *inputs **kwargs )

Base class for all models.

PreTrainedModel takes care of storing the configuration of the models and handles methods for loading, downloading and saving models as well as a few methods common to all models to:

Class attributes (overridden by derived classes):

push_to_hub

< source >

( repo_id: str commit_message: str | None = None commit_description: str | None = None private: bool | None = None token: bool | str | None = None revision: str | None = None create_pr: bool = False max_shard_size: int | str | None = '50GB' tags: list[str] | None = None )

Parameters

Upload the model file to the 🤗 Model Hub.

Examples:

from transformers import AutoModel

model = AutoModel.from_pretrained("google-bert/bert-base-cased")

model.push_to_hub("my-finetuned-bert")

model.push_to_hub("huggingface/my-finetuned-bert")

add_model_tags

< source >

( tags: list[str] | str )

Parameters

Add custom tags into the model that gets pushed to the Hugging Face Hub. Will not overwrite existing tags in the model.

Examples:

from transformers import AutoModel

model = AutoModel.from_pretrained("google-bert/bert-base-cased")

model.add_model_tags(["custom", "custom-bert"])

model.push_to_hub("my-custom-bert")

can_generate

< source >

( ) → bool

Whether this model can generate sequences with .generate().

Returns whether this model can generate sequences with .generate() from the GenerationMixin.

Under the hood, on classes where this function returns True, some generation-specific changes are triggered: for instance, the model instance will have a populated generation_config attribute.

Potentially dequantize the model in case it has been quantized by a quantization method that support dequantization.

Removes the _require_grads_hook.

Enables the gradients for the input embeddings. This is useful for fine-tuning adapter weights while keeping the model weights fixed.

from_pretrained

< source >

( pretrained_model_name_or_path: str | os.PathLike | None *model_args config: transformers.configuration_utils.PreTrainedConfig | str | os.PathLike | None = None cache_dir: str | os.PathLike | None = None ignore_mismatched_sizes: bool = False force_download: bool = False local_files_only: bool = False token: str | bool | None = None revision: str = 'main' use_safetensors: bool | None = None weights_only: bool = True **kwargs )

Parameters

Parameters for big model inference

Instantiate a pretrained pytorch model from a pre-trained model configuration.

The model is set in evaluation mode by default using model.eval() (Dropout modules are deactivated). To train the model, you should first set it back in training mode with model.train().

The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning task.

The warning Weights from XXX not used in YYY means that the layer XXX is not used by YYY, therefore those weights are discarded.

Activate the special “offline-mode” to use this method in a firewalled environment.

Examples:

from transformers import BertConfig, BertModel

model = BertModel.from_pretrained("google-bert/bert-base-uncased")

model = BertModel.from_pretrained("./test/saved_model/")

model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True) assert model.config.output_attentions == True

get_compiled_call

< source >

( compile_config: transformers.generation.configuration_utils.CompileConfig | None )

Return a torch.compile‘d version of self.__call__. This is useful to dynamically choose between non-compiled/compiled forward during inference, especially to switch between prefill (where we don’t want to use compiled version to avoid recomputing the graph with new shapes) and iterative decoding (where we want the speed-ups of compiled version with static shapes).

Best-effort lookup of the decoder module.

Order of attempts (covers ~85 % of current usages):

  1. self.decoder/self.language_model/self.text_model
  2. self.base_model (many wrappers store the decoder here)
  3. self.base_model.get_decoder() (nested wrappers)
  4. fallback: raise for the few exotic models that need a bespoke rule

get_encoder

< source >

( modality: str | None = None )

Best-effort lookup of the encoder module. If provided with modality argument, it looks for a modality-specific encoder in multimodal models (e.g. “image_encoder”) By default the function returns model’s text encoder if any, and otherwise returns self.

Possible modality values are “image”, “video” and “audio”.

get_expanded_tied_weights_keys

< source >

( all_submodels: bool = False )

Return the expanded tied weight keys (in case they contain modules or regex patterns) for only the current model, or recursively for all submodels if all_submodels=True (i.e. it will re-check the config values for all submodels).

For almost all models, we only require to tie the embeddings, so the model has an internal property_tied_weights_keys = {"lm_head.weight": "model.embed_tokens.weight"}. In this case, the mapping is already “expanded”, i.e. it already contains full parameters, and this function will simply return a copy of the property. For more complex patterns, e.g. for DFineForObjectDetection, we have the following attribute

_tied_weights_keys = { r"bbox_embed.(?![0])\d+": "bbox_embed.0", r"class_embed.(?![0])\d+": "class_embed.0", "model.decoder.class_embed": "class_embed", "model.decoder.bbox_embed": "bbox_embed", }

In this case, the function looks up all the model's parameters and buffers, and matches all the params,

returning the following:

{ 'bbox_embed.1.layers.0.bias': 'bbox_embed.0.layers.0.bias', 'bbox_embed.1.layers.0.weight': 'bbox_embed.0.layers.0.weight', 'bbox_embed.1.layers.1.bias': 'bbox_embed.0.layers.1.bias', 'bbox_embed.1.layers.1.weight': 'bbox_embed.0.layers.1.weight', 'bbox_embed.1.layers.2.bias': 'bbox_embed.0.layers.2.bias', 'bbox_embed.1.layers.2.weight': 'bbox_embed.0.layers.2.weight', 'bbox_embed.2.layers.0.bias': 'bbox_embed.0.layers.0.bias', 'bbox_embed.2.layers.0.weight': 'bbox_embed.0.layers.0.weight', ... 'class_embed.1.bias': 'class_embed.0.bias', 'class_embed.1.weight': 'class_embed.0.weight', 'class_embed.2.bias': 'class_embed.0.bias', 'class_embed.2.weight': 'class_embed.0.weight', ... 'model.decoder.class_embed.0.bias': 'class_embed.0.bias', 'model.decoder.class_embed.0.weight': 'class_embed.0.weight', 'model.decoder.class_embed.1.bias': 'class_embed.0.bias', 'model.decoder.class_embed.1.weight': 'class_embed.0.weight', ... 'model.decoder.bbox_embed.0.layers.0.bias': 'bbox_embed.0.layers.0.bias', 'model.decoder.bbox_embed.0.layers.0.weight': 'bbox_embed.0.layers.0.weight', 'model.decoder.bbox_embed.0.layers.1.bias': 'bbox_embed.0.layers.1.bias', 'model.decoder.bbox_embed.0.layers.1.weight': 'bbox_embed.0.layers.1.weight', ... }

i.e. all the parameters matching the regex and modules patterns in `_tied_weights_keys`

get_memory_footprint

< source >

( return_buffers = True )

Parameters

Get the memory footprint of a model. This will return the memory footprint of the current model in bytes. Useful to benchmark the memory footprint of the current model and design some tests. Solution inspired from the PyTorch discussions: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2

Return the parameter or buffer given by target if it exists, otherwise throw an error. This combinesget_parameter() and get_buffer() in a single handy function. If the target is an _extra_state attribute, it will return the extra state provided by the module. Note that it only work if target is a leaf of the model.

Deactivates gradient checkpointing for the current model.

Initialize and tie the weights if needed. If using a custom PreTrainedModel, you need to implement any initialization logic in _init_weights.

This is equivalent to calling self.apply(self._initialize_weights), but correctly handles composite models. This function dynamically dispatches the correct init_weights function to the modules as we advance in the module graph along the recursion. It can handle an arbitrary number of sub-models. Without it, every composite model would have to recurse a second time on all sub-models explicitly in the outer-most _init_weights, which is extremely error prone and inefficient.

mark_tied_weights_as_initialized

< source >

( loading_info )

Adds the _is_hf_initialized flag on parameters that will be tied, in order to avoid initializing them later as they will be tied (overwritten) anyway. This is very important as most embeddings are tied, and they are huge params (vocabularies are often 256k), so running inits on them is very costly.

named_non_persistent_buffers

< source >

( recurse: bool = True remove_duplicate: bool = True )

Similar to named_buffers, but only yield non-persistent ones. It is handy as it’s not perfectly straightforward to know if they are persistent or not

A method executed at the end of each Transformer model initialization, to execute code that needs the model’s modules properly initialized (such as weight initialization). It is also used to obtain all correct static properties (parallelism plans, tied_weights_keys, _keep_in_fp32_modules, etc) correctly in the case of composite models (that is, the top level model should know about those properties from its children).

register_for_auto_class

< source >

( auto_class = 'AutoModel' )

Parameters

Register this class with a given auto class. This should only be used for custom models as the ones in the library are already mapped with an auto class.

resize_token_embeddings

< source >

( new_num_tokens: int | None = None pad_to_multiple_of: int | None = None mean_resizing: bool = True ) → torch.nn.Embedding

Parameters

Returns

torch.nn.Embedding

Pointer to the input tokens Embeddings Module of the model.

Resizes input token embeddings matrix of the model if new_num_tokens != config.vocab_size.

Takes care of tying weights embeddings afterwards if the model class has a tie_weights() method.

save_pretrained

< source >

( save_directory: str | os.PathLike is_main_process: bool = True state_dict: dict | None = None push_to_hub: bool = False max_shard_size: int | str = '50GB' variant: str | None = None token: str | bool | None = None save_peft_format: bool = True save_original_format: bool = True **kwargs )

Parameters

Save a model and its configuration file to a directory, so that it can be re-loaded using thefrom_pretrained() class method.

set_attn_implementation

< source >

( attn_implementation: str | dict allow_all_kernels: bool = False )

Parameters

Set the requested attn_implementation for this model.

Symmetric setter. Mirrors the lookup logic used in get_decoder.

set_encoder

< source >

( encoder modality: str | None = None )

Symmetric setter. Mirrors the lookup logic used in get_encoder.

set_experts_implementation

< source >

( experts_implementation: str | dict )

Parameters

Set the requested experts_implementation for this model.

set_use_kernels

< source >

( use_kernels kernel_config: transformers.utils.kernel_config.KernelConfig | None = None )

Parameters

Set whether or not to use the kernels library to kernelize some layers of the model.

tie_weights

< source >

( missing_keys: set[str] | None = None recompute_mapping: bool = True )

Tie the model weights. If recompute_mapping=False (default when called internally), it will rely on themodel.all_tied_weights_keys attribute, containing the {target: source} mapping for the tied params. If recompute_mapping=True, it will re-check all internal submodels and their config to determine the params that need to be tied. This is the default when model.tie_weights() is called on its own, outside of__init__, and from_pretrained, in case the config values were changed somewhere.

Note that during from_pretrained, tying is symmetric: if the mapping says “tie target -> source” butsource is missing in the checkpoint while target exists, we swap source and target so we can still tie everything to the parameter that actually exists.

warn_if_padding_and_no_attention_mask

< source >

( input_ids attention_mask )

Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given.

Custom models should also include a _supports_assign_param_buffer, which determines if superfast init can apply on the particular model. Signs that your model needs this are if test_save_and_load_from_pretrained fails. If so, set this to False.

class transformers.modeling_utils.ModuleUtilsMixin

< source >

( )

A few utilities for torch.nn.Modules, to be used as a mixin.

get_extended_attention_mask

< source >

( attention_mask: Tensor input_shape: tuple dtype: torch.dtype | None = None )

Parameters

Makes broadcastable attention and causal masks so that future and masked tokens are ignored.

invert_attention_mask

< source >

( encoder_attention_mask: Tensor ) → torch.Tensor

Parameters

The inverted attention mask.

Invert an attention mask (e.g., switches 0. and 1.).

num_parameters

< source >

( only_trainable: bool = False exclude_embeddings: bool = False ) → int

Parameters

The number of parameters.

Get number of (optionally, trainable or non-embeddings) parameters in the module.

class transformers.utils.PushToHubMixin

< source >

( )

A Mixin containing the functionality to push a model or tokenizer to the hub.

push_to_hub

< source >

( repo_id: str commit_message: str | None = None commit_description: str | None = None private: bool | None = None token: bool | str | None = None revision: str | None = None create_pr: bool = False max_shard_size: int | str | None = '50GB' tags: list[str] | None = None )

Parameters

Upload the {object_files} to the 🤗 Model Hub.

Examples:

from transformers import {object_class}

{object} = {object_class}.from_pretrained("google-bert/bert-base-cased")

{object}.push_to_hub("my-finetuned-bert")

{object}.push_to_hub("huggingface/my-finetuned-bert")