Usage — Sentence Transformers documentation (original) (raw)
Characteristics of Sparse Encoder models:
- Calculates sparse vector representations where most dimensions are zero
- Provides efficiency benefits for large-scale retrieval systems due to the sparse nature of embeddings
- Often more interpretable than dense embeddings, with non-zero dimensions corresponding to specific tokens
- Complementary to dense embeddings, enabling hybrid search systems that combine the strengths of both approaches
Once you have installed Sentence Transformers, you can easily use Sparse Encoder models:
from sentence_transformers import SparseEncoder
1. Load a pretrained SparseEncoder model
model = SparseEncoder("naver/splade-cocondenser-ensembledistil")
The sentences to encode
sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium.", ]
2. Calculate sparse embeddings by calling model.encode()
embeddings = model.encode(sentences) print(embeddings.shape)
[3, 30522] - sparse representation with vocabulary size dimensions
3. Calculate the embedding similarities (using dot product by default)
similarities = model.similarity(embeddings, embeddings) print(similarities)
tensor([[ 35.629, 9.154, 0.098],
[ 9.154, 27.478, 0.019],
[ 0.098, 0.019, 29.553]])
4. Check sparsity statistics
stats = SparseEncoder.sparsity(embeddings) print(f"Sparsity: {stats['sparsity_ratio']:.2%}") # Typically >99% zeros print(f"Avg non-zero dimensions per embedding: {stats['active_dims']:.2f}")
Prompts
Some Sparse Encoder models are trained with specific prompts for different use cases (e.g., queries vs. documents). You can use SparseEncoder.encode with the prompt_name parameter, or the convenience methods encode_query() andencode_document():
from sentence_transformers import SparseEncoder
model = SparseEncoder("model-with-prompts")
Encode queries and documents with the appropriate prompts
query_embeddings = model.encode_query(["What is the weather like?"]) document_embeddings = model.encode_document(["The weather is lovely today."])
Equivalent to:
query_embeddings = model.encode(["What is the weather like?"], prompt_name="query") document_embeddings = model.encode(["The weather is lovely today."], prompt_name="document")
You can inspect or set the available prompts via the prompts and default_prompt_name attributes:
print(model.prompts)
{'query': 'query: ', 'document': 'document: '}
model.default_prompt_name = "query"