Modules — Sentence Transformers documentation (original) (raw)

sentence_transformers.sparse_encoder.modules defines different building blocks, that can be used to create SparseEncoder networks from scratch. For more details, see Training Overview.

See also the modules from sentence_transformers.base.modules in Base > Modules.

SPLADE Pooling

class sentence_transformers.sparse_encoder.modules.SpladePooling(pooling_strategy: Literal['max', 'sum'] = 'max', activation_function: Literal['relu', 'log1p_relu'] = 'relu', embedding_dimension: int | None = None, chunk_size: int | None = None)[source]

SPLADE Pooling module for creating the sparse embeddings.

This module implements the SPLADE pooling mechanism that:

Takes token logits from a masked language model (MLM).
Applies a sparse transformation using an activation function followed by log1p (i.e., log(1 + activation(MLM_logits))).
Applies a pooling strategy max or sum to produce sparse embeddings.

The resulting embeddings are highly sparse and capture lexical information, making them suitable for efficient information retrieval.

Parameters:

pooling_strategy (str) – Pooling method across token dimensions. Choices: - sum: Sum pooling (used in original SPLADE see https://huggingface.co/papers/2107.05720). - max: Max pooling (used in SPLADEv2 and later models see https://huggingface.co/papers/2109.10086 or https://huggingface.co/papers/2205.04733).
activation_function (str) – Activation function applied before log1p transformation. Choices: - relu: ReLU activation (standard in all Splade models). - log1p_relu: log(1 + ReLU(x)) variant used in Opensearch Splade models, see https://huggingface.co/papers/2504.14839.
embedding_dimension (int , optional) – Dimensionality of the output embeddings (if needed).
chunk_size (int , optional) – Chunk size along the sequence length dimension (i.e., number of tokens per chunk). If None, processes entire sequence at once. Using smaller chunks the reduces memory usage but may lower the training and inference speed. Default is None.

SparseAutoEncoder

class sentence_transformers.sparse_encoder.modules.SparseAutoEncoder(input_dim: int, hidden_dim: int = 512, k: int = 8, k_aux: int = 512, normalize: bool = False, dead_threshold: int = 30)[source]

This module implements the Sparse AutoEncoder architecture based on the paper: Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation, https://huggingface.co/papers/2503.01776

This module transforms dense embeddings into sparse representations by:

Applying a multi-layer feed-forward network
Applying top-k sparsification to keep only the largest values
Supporting auxiliary losses for training stability (via k_aux parameter)

Parameters:

input_dim – Dimension of the input embeddings.
hidden_dim – Dimension of the hidden layers. Defaults to 512.
k – Number of top values to keep in the final sparse representation. Defaults to 8.
k_aux – Number of top values to keep for auxiliary loss calculation. Defaults to 512.
normalize – Whether to apply layer normalization to the input embeddings. Defaults to False.
dead_threshold – Threshold for dead neurons. Neurons with non-zero activations below this threshold are considered dead. Defaults to 30.

SparseStaticEmbedding

class sentence_transformers.sparse_encoder.modules.SparseStaticEmbedding(tokenizer: PreTrainedTokenizer, weight: torch.Tensor | None = None, frozen: bool = False)[source]

SparseStaticEmbedding module for efficient sparse representations.

This lightweight module computes sparse representations by mapping input tokens to static weights, such as IDF (Inverse Document Frequency) weights. It is designed to encode queries or documents into fixed-size embeddings based on the presence of tokens in the input.

A common scenario is to use this module for encoding queries, and using a heavier module like SPLADE (Transformer with “fill-mask” + SpladePooling) for document encoding.

Parameters:

tokenizer (PreTrainedTokenizer) – PreTrainedTokenizer to tokenize input texts into input IDs.
weight (torch.Tensor | None) – Static weights for vocabulary tokens (e.g., IDF weights), shape should be (vocab_size,). If None, initializes weights to a vector of ones. Default is None.
frozen (bool) – Whether the weights should be frozen (not trainable). Default is False.