SentenceTransformer — Sentence Transformers documentation (original) (raw)

SentenceTransformer

class sentence_transformers.sentence_transformer.model.SentenceTransformer(model_name_or_path: str | None = None, *, modules: list[Module] | None = None, device: str | None = None, prompts: dict[str, str] | None = None, default_prompt_name: str | None = None, cache_folder: str | None = None, trust_remote_code: bool = False, revision: str | None = None, local_files_only: bool = False, token: bool | str | None = None, use_auth_token: bool | str | None = None, model_kwargs: dict[str, Any] | None = None, processor_kwargs: dict[str, Any] | None = None, config_kwargs: dict[str, Any] | None = None, model_card_data: SentenceTransformerModelCardData | None = None, backend: Literal['torch', 'onnx', 'openvino'] = 'torch', similarity_fn_name: Literal['cosine', 'dot', 'euclidean', 'manhattan'] | SimilarityFunction | None = None, truncate_dim: int | None = None)[source]

Loads or creates a SentenceTransformer model that can be used to map text and other inputs to dense embeddings.

Parameters:

Example

from sentence_transformers import SentenceTransformer

Load a pre-trained SentenceTransformer model

model = SentenceTransformer('sentence-transformers/all-mpnet-base-v2')

Encode some texts

sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium.", ] embeddings = model.encode(sentences) print(embeddings.shape)

(3, 768)

Get the similarity scores between all sentences

similarities = model.similarity(embeddings, embeddings) print(similarities)

tensor([[1.0000, 0.6817, 0.0492],

[0.6817, 1.0000, 0.0421],

[0.0492, 0.0421, 1.0000]])

Initialize a BaseModel instance.

Parameters:

active_adapters() → list[str][source]

If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft

Gets the current active adapters of the model. In case of multi-adapter inference (combining multiple adapters for inference) returns the list of all active adapters so that users can deal with them accordingly.

For previous PEFT versions (that does not support multi-adapter inference), module.active_adapter will return a single string.

add_adapter(*args, **kwargs) → None[source]

Adds a fresh new adapter to the current model for training purposes. If no adapter name is passed, a default name is assigned to the adapter to follow the convention of PEFT library (in PEFT we use “default” as the default adapter name).

Requires peft as a backend to load the adapter weights and the underlying model to be compatible with PEFT.

Parameters:

bfloat16() → T

Casts all floating point parameters and buffers to bfloat16 datatype.

Note

This method modifies the module in-place.

Returns:

self

Return type:

Module

compile(*args, **kwargs)

Compile this Module’s forward using torch.compile().

This Module’s __call__ method is compiled and all arguments are passed as-is to torch.compile().

See torch.compile() for details on the arguments for this function.

cpu() → T

Moves all model parameters and buffers to the CPU.

Note

This method modifies the module in-place.

Returns:

self

Return type:

Module

cuda(device: int | device | None = None) → T

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.

Note

This method modifies the module in-place.

Parameters:

device (int , optional) – if specified, all parameters will be copied to that device

Returns:

self

Return type:

Module

delete_adapter(*args, **kwargs) → None[source]

If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft

Delete an adapter’s LoRA layers from the underlying model.

Parameters:

property device_: device_

Get torch.device from module, assuming that the whole module has one device. In case there are no PyTorch parameters, fall back to CPU.

disable_adapters() → None[source]

Disable all adapters that are attached to the model. This leads to inferring with the base model only.

double() → T

Casts all floating point parameters and buffers to double datatype.

Note

This method modifies the module in-place.

Returns:

self

Return type:

Module

property dtype_: dtype | None_

The dtype of the module (assuming that all the module parameters have the same dtype).

Type:

torch.dtype

enable_adapters() → None[source]

Enable adapters that are attached to the model. The model will use self.active_adapter()

encode(inputs: SingleInput, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding', 'token_embeddings'] = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: Literal[False] = True, convert_to_tensor: bool = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) → Tensor[source]

encode(inputs: list[SingleInput] | SingleInput, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding'] = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: Literal[True] = True, convert_to_tensor: Literal[False] = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) → np.ndarray

encode(inputs: list[SingleInput] | SingleInput, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding'] = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: Literal[True] = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) → Tensor

encode(inputs: list[SingleInput], prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding', 'token_embeddings'] = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) → list[Tensor]

encode(inputs: list[SingleInput], prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: None = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) → list[dict[str, Tensor]]

encode(inputs: SingleInput, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: None = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) → dict[str, Tensor]

encode(inputs: SingleInput, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['token_embeddings'] = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) → Tensor

Computes embeddings for the given inputs.

Tip

If you are unsure whether you should use encode(), encode_query(), or encode_document(), your best bet is to use encode_query() and encode_document() for Information Retrieval tasks with clear query and document/passage distinction, and use encode() for all other tasks.

Note that encode() is the most general method and can be used for any task, including Information Retrieval, and that if the model was not trained with predefined prompts and/or task types, then all three methods will return identical embeddings.

Tip

Adjusting batch_size can significantly improve processing speed. The optimal value depends on your hardware, model size, precision, and input length. Benchmark a few batch sizes on a small subset of your data to find the best value.

Parameters:

Returns:

By default, a 2d numpy

array with shape [num_inputs, output_dimension] is returned. If output_value is None, a list of dicts (or a single dict for singular input) is returned.

Return type:

Union[List[Tensor], ndarray, Tensor, dict[str, Tensor], list[dict[str, Tensor]]]

encode_document(inputs: list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | MessageDict | list[MessageDict] | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]] | str | Image | ndarray | Tensor | AudioDict | None | VideoDict | MessageDict | list[MessageDict] | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict], prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding', 'token_embeddings'] | None = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) → list[Tensor] | ndarray | Tensor | dict[str, Tensor] | list[dict[str, Tensor]][source]

Computes embeddings specifically optimized for document/passage representation.

This method is a specialized version of encode() that differs in exactly two ways:

  1. If no prompt_name or prompt is provided, it uses the first available prompt from the following candidates: "document", "passage", "corpus" (checked in that order).
  2. It sets the task to “document”. If the model has a Routermodule, it will use the “document” task type to route the input through the appropriate submodules.

Tip

Adjusting batch_size can significantly improve processing speed. The optimal value depends on your hardware, model size, precision, and input length. Benchmark a few batch sizes on a small subset of your data to find the best value.

All other parameters are identical to encode(). See encode() for the full parameter documentation.

encode_multi_process(sentences: list[str], pool: dict[Literal['input', 'output', 'processes'], Any], prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, chunk_size: int | None = None, show_progress_bar: bool | None = None, precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', normalize_embeddings: bool = False, truncate_dim: int | None = None) → ndarray[source]

Warning

This method is deprecated. You can now call SentenceTransformer.encode()with the same parameters instead, which will automatically handle multi-process encoding using the provided pool.

encode_query(inputs: list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | MessageDict | list[MessageDict] | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]] | str | Image | ndarray | Tensor | AudioDict | None | VideoDict | MessageDict | list[MessageDict] | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict], prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding', 'token_embeddings'] | None = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) → list[Tensor] | ndarray | Tensor | dict[str, Tensor] | list[dict[str, Tensor]][source]

Computes embeddings specifically optimized for query representation.

This method is a specialized version of encode() that differs in exactly two ways:

  1. If no prompt_name or prompt is provided, it uses a predefined “query” prompt, if available in the model’s prompts dictionary.
  2. It sets the task to “query”. If the model has a Routermodule, it will use the “query” task type to route the input through the appropriate submodules.

Tip

Adjusting batch_size can significantly improve processing speed. The optimal value depends on your hardware, model size, precision, and input length. Benchmark a few batch sizes on a small subset of your data to find the best value.

All other parameters are identical to encode(). See encode() for the full parameter documentation.

eval() → T

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

This is equivalent with self.train(False).

See Locally disabling gradient computation for a comparison between.eval() and several similar mechanisms that may be confused with it.

Returns:

self

Return type:

Module

evaluate(evaluator: BaseEvaluator, output_path: str | None = None) → dict[str, float] | float[source]

Evaluate the model based on an evaluator

Parameters:

Returns:

The evaluation results.

fit(train_objectives: ~collections.abc.Iterable[tuple[~torch.utils.data.dataloader.DataLoader, ~torch.nn.modules.module.Module]], evaluator: ~sentence_transformers.base.evaluation.evaluator.BaseEvaluator | None = None, epochs: int = 1, steps_per_epoch=None, scheduler: str = 'WarmupLinear', warmup_steps: int = 10000, optimizer_class: type[~torch.optim.optimizer.Optimizer] = <class 'torch.optim.adamw.AdamW'>, optimizer_params: dict[str, object] = {'lr': 2e-05}, weight_decay: float = 0.01, evaluation_steps: int = 0, output_path: str | None = None, save_best_model: bool = True, max_grad_norm: float = 1, use_amp: bool = False, callback: ~collections.abc.Callable[[float, int, int], None] | None = None, show_progress_bar: bool = True, checkpoint_path: str | None = None, checkpoint_save_steps: int = 500, checkpoint_save_total_limit: int = 0, resume_from_checkpoint: bool = False) → None[source]

Deprecated training method from before Sentence Transformers v3.0, it is recommended to useSentenceTransformerTrainer instead. This method usesSentenceTransformerTrainer behind the scenes, but does not provide as much flexibility as the Trainer itself.

This training approach uses a list of DataLoaders and Loss functions to train the model. Each DataLoader is sampled in turn for one batch. We sample only as many batches from each DataLoader as there are in the smallest one to make sure of equal training with each dataset, i.e. round robin sampling.

This method should produce equivalent results in v3.0+ as before v3.0, but if you encounter any issues with your existing training scripts, then you may wish to useSentenceTransformer.old_fit instead. That uses the old training method from before v3.0.

Parameters:

float() → T

Casts all floating point parameters and buffers to float datatype.

Note

This method modifies the module in-place.

Returns:

self

Return type:

Module

get_adapter_state_dict(*args, **kwargs) → dict[source]

If you are not familiar with adapters and PEFT methods, we invite you to read more about them on the PEFT official documentation: https://huggingface.co/docs/peft

Gets the adapter state dict that should only contain the weights tensors of the specified adapter_name adapter. If no adapter_name is passed, the active adapter is used.

Parameters:

get_backend() → Literal['torch', 'onnx', 'openvino'][source]

Return the backend used for inference, which can be one of “torch”, “onnx”, or “openvino”.

Returns:

The backend used for inference.

Return type:

str

get_embedding_dimension() → int | None[source]

Returns the number of dimensions in the output of SentenceTransformer.encode().

Returns:

The number of dimensions in the output of encode. If it’s not known, it’s None.

Return type:

Optional[int]

get_max_seq_length() → int | None[source]

Deprecated since version Use: the max_seq_length property instead.

Returns the maximal sequence length that the first module of the model accepts. Longer inputs will be truncated.

Returns:

The maximal sequence length that the model accepts, or None if it is not defined.

Return type:

Optional[int]

get_model_kwargs() → list[str][source]

Get the keyword arguments specific to this model for inference methods like encode or predict.

Example

from sentence_transformers import SentenceTransformer, SparseEncoder SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2").get_model_kwargs() [] SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True).get_model_kwargs() ['task', 'truncate_dim'] SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill").get_model_kwargs() ['task']

Returns:

A list of keyword arguments for the forward pass.

Return type:

list[str]

gradient_checkpointing_enable(gradient_checkpointing_kwargs: dict[str, Any] | None = None) → None[source]

Enable gradient checkpointing for the model.

half() → T

Casts all floating point parameters and buffers to half datatype.

Note

This method modifies the module in-place.

Returns:

self

Return type:

Module

is_singular_input(inputs: Any) → bool[source]

Check if the input represents a single example or a batch of examples.

Parameters:

inputs – The input to check.

Returns:

True if the input is a single example, False if it is a batch.

Return type:

bool

static load(input_path: str) → SentenceTransformer[source]

Deprecated: Use SentenceTransformer(input_path) instead.

load_adapter(*args, **kwargs) → None[source]

Load adapter weights from file or remote Hub folder.” If you are not familiar with adapters and PEFT methods, we invite you to read more about them on PEFT official documentation: https://huggingface.co/docs/peft

Requires peft as a backend to load the adapter weights and the underlying model to be compatible with PEFT.

Parameters:

property max_seq_length_: int | None_

Returns the maximal input sequence length for the model. Longer inputs will be truncated.

Returns:

The maximal input sequence length, or None if not defined.

Return type:

Optional[int]

property modalities_: list[Literal['text', 'image', 'audio', 'video', 'message'] | tuple[Literal['text', 'image', 'audio', 'video'], ...]]_

Return the list of modalities supported by this model, e.g. ["text"] or ["text", "image", "message"].

model_card_data_class[source]

alias of SentenceTransformerModelCardData

old_fit(train_objectives: ~collections.abc.Iterable[tuple[~torch.utils.data.dataloader.DataLoader, ~torch.nn.modules.module.Module]], evaluator: ~sentence_transformers.base.evaluation.evaluator.BaseEvaluator | None = None, epochs: int = 1, steps_per_epoch=None, scheduler: str = 'WarmupLinear', warmup_steps: int = 10000, optimizer_class: type[~torch.optim.optimizer.Optimizer] = <class 'torch.optim.adamw.AdamW'>, optimizer_params: dict[str, object] = {'lr': 2e-05}, weight_decay: float = 0.01, evaluation_steps: int = 0, output_path: str | None = None, save_best_model: bool = True, max_grad_norm: float = 1, use_amp: bool = False, callback: ~collections.abc.Callable[[float, int, int], None] | None = None, show_progress_bar: bool = True, checkpoint_path: str | None = None, checkpoint_save_steps: int = 500, checkpoint_save_total_limit: int = 0) → None[source]

Deprecated training method from before Sentence Transformers v3.0, it is recommended to usesentence_transformers.sentence_transformer.trainer.SentenceTransformerTrainer instead. This method should only be used if you encounter issues with your existing training scripts after upgrading to v3.0+.

This training approach uses a list of DataLoaders and Loss functions to train the model. Each DataLoader is sampled in turn for one batch. We sample only as many batches from each DataLoader as there are in the smallest one to make sure of equal training with each dataset, i.e. round robin sampling.

Parameters:

preprocess(inputs: list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | MessageDict | list[MessageDict] | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict] | tuple[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict], str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]] | list[str | Image | ndarray | Tensor | AudioDict | None | VideoDict | dict[Literal['text', 'image', 'audio', 'video'], str | Image | ndarray | Tensor | AudioDict | None | VideoDict]]], prompt: str | None = None, **kwargs) → dict[str, Tensor | Any][source]

Preprocesses the inputs for the model.

Parameters:

Returns:

A dictionary of tensors with the preprocessed inputs.

Return type:

dict[str, Tensor | Any]

property processor_: Any_

Property to get the processor that is used by this model

push_to_hub(repo_id: str, token: str | None = None, private: bool | None = None, safe_serialization: bool = True, commit_message: str | None = None, local_model_path: str | None = None, exist_ok: bool = False, replace_model_card: bool = False, train_datasets: list[str] | None = None, revision: str | None = None, create_pr: bool = False) → str[source]

Uploads all elements of this model to a HuggingFace Hub repository, creating it if it doesn’t exist.

Parameters:

Returns:

The url of the commit of your model in the repository on the Hugging Face Hub.

Return type:

str

save_pretrained(path: str, model_name: str | None = None, create_model_card: bool = True, train_datasets: list[str] | None = None, safe_serialization: bool = True) → None[source]

Saves a model and its configuration files to a directory, so that it can be loaded again.

Parameters:

set_adapter(*args, **kwargs) → None[source]

Sets a specific adapter by forcing the model to use that adapter and disable the other adapters.

Parameters:

set_pooling_include_prompt(include_prompt: bool) → None[source]

Sets the include_prompt attribute in the pooling layer in the model, if there is one.

This is useful for INSTRUCTOR models, as the prompt should be excluded from the pooling strategy for these models.

property similarity_: Callable[[Tensor | ndarray[Any, dtype[float32]], Tensor | ndarray[Any, dtype[float32]]], Tensor]_

Return a function that computes the similarity between two collections of embeddings. The output will be a matrix with the similarity scores between all embeddings from the first parameter and all embeddings from the second parameter.

property similarity_fn_name_: Literal['cosine', 'dot', 'euclidean', 'manhattan']_

Return the name of the similarity function.

If not previously set, accessing this property defaults it to "cosine".

property similarity_pairwise_: Callable[[Tensor | ndarray[Any, dtype[float32]], Tensor | ndarray[Any, dtype[float32]]], Tensor]_

Return a function that computes the pairwise similarity between two collections of embeddings.

smart_batching_collate(batch: list[InputExample]) → tuple[list[dict[str, Tensor]], Tensor][source]

Transforms a batch from a SmartBatchingDataset to a batch of tensors for the model Here, batch is a list of InputExample instances: [InputExample(…), …]

Parameters:

batch – a batch from a SmartBatchingDataset

Returns:

a batch of tensors for the model

start_multi_process_pool(target_devices: list[str] | None = None) → dict[Literal['input', 'output', 'processes'], Any][source]

Starts a multi-process pool to infer with several independent processes.

This method is recommended if you want to predict on multiple GPUs or CPUs. It is advised to start only one process per GPU. This method works together with predict and stop_multi_process_pool.

Parameters:

target_devices (List [ str ] , optional) – PyTorch target devices, e.g. [“cuda:0”, “cuda:1”, …], [“npu:0”, “npu:1”, …], or [“cpu”, “cpu”, “cpu”, “cpu”]. If target_devices is None and CUDA/NPU is available, then all available CUDA/NPU devices will be used. If target_devices is None and CUDA/NPU is not available, then 4 CPU devices will be used.

Returns:

A dictionary with the target processes, an input queue, and an output queue.

Return type:

Dict[str, Any]

static stop_multi_process_pool(pool: dict[Literal['input', 'output', 'processes'], Any]) → None[source]

Stops all processes started with start_multi_process_pool.

Parameters:

pool (Dict [ str , object ]) – A dictionary containing the input queue, output queue, and process list.

Returns:

None

supports(modality: Literal['text', 'image', 'audio', 'video', 'message'] | tuple[Literal['text', 'image', 'audio', 'video'], ...]) → bool[source]

Check if the model supports the given modality.

A modality is supported if:

  1. It is directly listed in modalities (including tuple modalities that are explicitly listed), or
  2. It is a tuple of modalities (e.g. ("image", "text")) where each part is individually supported and the model also supports "message" format, which is used to combine multiple modalities into a single input.

Parameters:

modality – A single modality string (e.g. "text", "image") or a tuple of modality strings (e.g. ("image", "text")).

Returns:

Whether the model supports the given modality.

Return type:

bool

Example:

from sentence_transformers import SentenceTransformer model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") model.supports("text") True model.supports("image") False

to(*args, **kwargs)

Moves and/or casts the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtypes. In addition, this method will only cast the floating point or complex parameters and buffers to dtype(if given). The integral parameters and buffers will be moveddevice, if that is given, but with dtypes unchanged. Whennon_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters:

Returns:

self

Return type:

Module

Examples:

xdoctest: +IGNORE_WANT("non-deterministic")

linear = nn.Linear(2, 2) linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64)

xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1)

gpu1 = torch.device("cuda:1") linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') cpu = torch.device("cpu") linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16)

linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

tokenize(texts: list[str] | list[dict] | list[tuple[str, str]], **kwargs) → dict[str, Tensor][source]

Deprecated since version `tokenize`: is deprecated. Use preprocess instead.

property tokenizer_: Any_

Property to get the tokenizer that is used by this model

train(mode: bool = True) → T

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns:

self

Return type:

Module

property transformers_model_: PreTrainedModel | None_

Property to get the underlying transformers PreTrainedModel instance, if it exists. Note that it’s possible for a model to have multiple underlying transformers models, but this property will return the first one it finds in the module hierarchy.

Note

This property can also return e.g. ORTModelForFeatureExtraction or OVModelForFeatureExtraction instances from the optimum-intel and optimum-onnx libraries, if the model is loaded using backend="onnx" orbackend="openvino".

Returns:

The underlying transformers model or None if not found.

Return type:

PreTrainedModel or None

truncate_embeddings(truncate_dim: int | None) → Iterator[None][source]

In this context, SentenceTransformer.encode() outputs embeddings truncated at dimension truncate_dim.

This may be useful when you are using the same model for different applications where different dimensions are needed.

Parameters:

truncate_dim (int , optional) – The dimension to truncate embeddings to. None does no truncation.

Example

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")

with model.truncate_embeddings(truncate_dim=16): embeddings_truncated = model.encode(["hello there", "hiya"]) assert embeddings_truncated.shape[-1] == 16

SentenceTransformerModelCardData

class sentence_transformers.sentence_transformer.model_card.SentenceTransformerModelCardData(language: str | list[str] | None = , license: str | None = None, model_name: str | None = None, model_id: str | None = None, train_datasets: list[dict[str, str]] = , eval_datasets: list[dict[str, str]] = , task_name: str | None = 'retrieval', tags: list[str] = , local_files_only: bool = False, generate_widget_examples: bool = True)[source]

A dataclass storing data used in the model card.

Parameters:

Tip

Install codecarbon to automatically track carbon emission usage and include it in your model cards.

Example:

model = SentenceTransformer( ... "microsoft/mpnet-base", ... model_card_data=SentenceTransformerModelCardData( ... model_id="tomaarsen/sbert-mpnet-base-allnli", ... train_datasets=[{"name": "SNLI", "id": "stanfordnlp/snli"}, {"name": "MultiNLI", "id": "nyu-mll/multi_nli"}], ... eval_datasets=[{"name": "SNLI", "id": "stanfordnlp/snli"}, {"name": "MultiNLI", "id": "nyu-mll/multi_nli"}], ... license="apache-2.0", ... language="en", ... ), ... )