Semantic Search — Sentence Transformers documentation (original) (raw)

Semantic search refers to search techniques that go beyond traditional keyword-based search. Instead of relying solely on exact matches of keywords, semantic search aims to understand the meaning and context of the query and the documents being searched. This allows for more relevant and accurate search results, even when the exact keywords may not match.

Sparse embeddings are a type of representation where most of the values are zero, and only a small number of dimensions contain non-zero (a.k.a. active) values. This is in contrast to dense embeddings, where all dimensions typically have non-zero values. Traditional sparse embedding solutions are often lexically based, meaning they rely on exact matches of terms or phrases. However, modern sparse encoders like SPLADE and other sparse encoder models can generate embeddings that capture semantic meaning while still being sparse.

These embeddings can allow for extremely efficient semantic search, as long as the search solution takes good advantage of the fact that the large majority of sparse embedding dimensions are 0. This page shows an example demonstrating how to perform semantic search manually, but also how to integrate a SparseEncoder model with popular vector databases/search systems.

If you aren’t familiar with Semantic Search, see the Sentence Transformers > Semantic Search for a broader explanation using dense embedding models.

Manual Search

Manually performing semantic search with sparse encoders is straightforward, and only consists of a few steps:

Load a SparseEncoder model: Load a pretrained sparse encoder model from the Hugging Face Hub or your local directory.
Encode the corpus: Use the model to encode a set of documents (the corpus) into sparse embeddings.
Encode the queries: Encode the user queries into sparse embeddings using the same model.
Compute similarity: Calculate the similarity between the query embeddings and the corpus embeddings using a suitable similarity function (e.g., cosine similarity, dot product).
Retrieve results: Sort the results based on similarity scores and return the most relevant documents.
Analyze results: Optionally, analyze the results to understand which tokens contributed most to the similarity scores.

from sentence_transformers import SparseEncoder, util

1. Load a pretrained SparseEncoder model

model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

2. Encode a corpus of texts using the SparseEncoder model

corpus = [ "Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed.", "Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning.", "Neural networks are computing systems vaguely inspired by the biological neural networks that constitute animal brains.", "Mars rovers are robotic vehicles designed to travel on the surface of Mars to collect data and perform experiments.", "The James Webb Space Telescope is the largest optical telescope in space, designed to conduct infrared astronomy.", "SpaceX's Starship is designed to be a fully reusable transportation system capable of carrying humans to Mars and beyond.", "Global warming is the long-term heating of Earth's climate system observed since the pre-industrial period due to human activities.", "Renewable energy sources include solar, wind, hydro, and geothermal power that naturally replenish over time.", "Carbon capture technologies aim to collect CO2 emissions before they enter the atmosphere and store them underground.", ]

Use "convert_to_tensor=True" to keep the tensors on GPU (if available)

corpus_embeddings = model.encode_document(corpus, convert_to_tensor=True)

3. Encode the user queries using the same SparseEncoder model

queries = [ "How do artificial neural networks work?", "What technology is used for modern space exploration?", "How can we address climate change challenges?", ] query_embeddings = model.encode_query(queries, convert_to_tensor=True)

4. Use the similarity function to compute the similarity scores between the query and corpus embeddings

top_k = min(5, len(corpus)) # Find at most 5 sentences of the corpus for each query sentence results = util.semantic_search(query_embeddings, corpus_embeddings, top_k=top_k, score_function=model.similarity)

5. Sort the results and print the top 5 most similar sentences for each query

for query_id, query in enumerate(queries): pointwise_scores = model.intersection(query_embeddings[query_id], corpus_embeddings)

print(f"Query: {query}")
for res in results[query_id]:
    corpus_id, score = res.values()
    sentence = corpus[corpus_id]

    pointwise_score = model.decode(pointwise_scores[corpus_id], top_k=10)

    token_scores = ", ".join([f'("{token.strip()}", {value:.2f})' for token, value in pointwise_score])

    print(f"Score: {score:.4f} - Sentence: {sentence} - Top influential tokens: {token_scores}")
print("")

Toggle To See Results

""" Query: How do artificial neural networks work? Score: 16.9053 - Sentence: Neural networks are computing systems vaguely inspired by the biological neural networks that constitute animal brains. - Top influential tokens: ("neural", 5.71), ("networks", 3.24), ("network", 2.93), ("brain", 2.10), ("computer", 0.50), ("##uron", 0.32), ("artificial", 0.27), ("technology", 0.27), ("communication", 0.27), ("connection", 0.21) Score: 13.6119 - Sentence: Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. - Top influential tokens: ("artificial", 3.71), ("neural", 3.15), ("networks", 1.78), ("brain", 1.22), ("network", 1.12), ("ai", 1.07), ("machine", 0.39), ("robot", 0.20), ("technology", 0.20), ("algorithm", 0.18) Score: 2.7373 - Sentence: Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. - Top influential tokens: ("machine", 0.78), ("computer", 0.50), ("technology", 0.32), ("artificial", 0.22), ("robot", 0.21), ("ai", 0.20), ("process", 0.16), ("theory", 0.11), ("technique", 0.11), ("fuzzy", 0.06) Score: 2.1430 - Sentence: Carbon capture technologies aim to collect CO2 emissions before they enter the atmosphere and store them underground. - Top influential tokens: ("technology", 0.42), ("function", 0.41), ("mechanism", 0.21), ("sensor", 0.21), ("device", 0.18), ("process", 0.18), ("generator", 0.13), ("detection", 0.10), ("technique", 0.10), ("tracking", 0.05) Score: 2.0195 - Sentence: Mars rovers are robotic vehicles designed to travel on the surface of Mars to collect data and perform experiments. - Top influential tokens: ("robot", 0.67), ("function", 0.34), ("technology", 0.29), ("device", 0.23), ("experiment", 0.20), ("machine", 0.10), ("artificial", 0.08), ("design", 0.04), ("useful", 0.03), ("they", 0.02)

Query: What technology is used for modern space exploration? Score: 10.4748 - Sentence: SpaceX's Starship is designed to be a fully reusable transportation system capable of carrying humans to Mars and beyond. - Top influential tokens: ("space", 4.40), ("technology", 1.15), ("nasa", 1.06), ("mars", 0.63), ("exploration", 0.52), ("spacecraft", 0.44), ("robot", 0.32), ("rocket", 0.28), ("astronomy", 0.27), ("travel", 0.26) Score: 9.3818 - Sentence: The James Webb Space Telescope is the largest optical telescope in space, designed to conduct infrared astronomy. - Top influential tokens: ("space", 3.89), ("nasa", 1.09), ("astronomy", 0.93), ("discovery", 0.48), ("instrument", 0.47), ("technology", 0.35), ("device", 0.26), ("spacecraft", 0.25), ("invented", 0.22), ("equipment", 0.22) Score: 8.5147 - Sentence: Mars rovers are robotic vehicles designed to travel on the surface of Mars to collect data and perform experiments. - Top influential tokens: ("technology", 1.39), ("mars", 0.79), ("exploration", 0.78), ("robot", 0.67), ("used", 0.66), ("nasa", 0.52), ("spacecraft", 0.44), ("device", 0.39), ("explore", 0.38), ("travel", 0.25) Score: 7.6993 - Sentence: Carbon capture technologies aim to collect CO2 emissions before they enter the atmosphere and store them underground. - Top influential tokens: ("technology", 1.99), ("tech", 1.76), ("technologies", 1.74), ("equipment", 0.32), ("device", 0.31), ("technological", 0.28), ("mining", 0.22), ("sensor", 0.19), ("tool", 0.18), ("software", 0.11) Score: 2.5526 - Sentence: Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. - Top influential tokens: ("technology", 1.52), ("machine", 0.27), ("robot", 0.21), ("computer", 0.18), ("engineering", 0.12), ("technique", 0.11), ("science", 0.05), ("technological", 0.05), ("techniques", 0.02), ("innovation", 0.01)

Query: How can we address climate change challenges? Score: 9.5587 - Sentence: Global warming is the long-term heating of Earth's climate system observed since the pre-industrial period due to human activities. - Top influential tokens: ("climate", 3.21), ("warming", 2.87), ("weather", 1.58), ("change", 0.46), ("global", 0.41), ("environmental", 0.39), ("storm", 0.19), ("pollution", 0.15), ("environment", 0.11), ("adaptation", 0.08) Score: 1.3191 - Sentence: Carbon capture technologies aim to collect CO2 emissions before they enter the atmosphere and store them underground. - Top influential tokens: ("warming", 0.39), ("pollution", 0.34), ("environmental", 0.15), ("goal", 0.12), ("strategy", 0.07), ("monitoring", 0.07), ("protection", 0.06), ("greenhouse", 0.05), ("safety", 0.02), ("escape", 0.01) Score: 1.0774 - Sentence: Renewable energy sources include solar, wind, hydro, and geothermal power that naturally replenish over time. - Top influential tokens: ("conservation", 0.39), ("sustainability", 0.18), ("environmental", 0.18), ("sustainable", 0.13), ("agriculture", 0.13), ("alternative", 0.07), ("recycling", 0.00) Score: 0.2401 - Sentence: Machine learning is a field of study that gives computers the ability to learn without being explicitly programmed. - Top influential tokens: ("strategy", 0.10), ("success", 0.06), ("foster", 0.04), ("engineering", 0.03), ("innovation", 0.00), ("research", 0.00) Score: 0.1516 - Sentence: Deep learning is part of a broader family of machine learning methods based on artificial neural networks with representation learning. - Top influential tokens: ("strategy", 0.09), ("foster", 0.04), ("research", 0.01), ("approach", 0.01), ("engineering", 0.01) """

Vector Database Search

Alternatively, some vector databases and search engines can be used to perform semantic search with sparse encoders. These systems are designed to efficiently handle large-scale vector data and provide fast retrieval of relevant documents. They can leverage the sparsity of the embeddings to optimize storage and search operations.

The overall structure is similar to the manual search, but the vector database handles the indexing and retrieval of documents. The steps are approximately as follows:

Encode the corpus: Load your data and encode the documents using a pretrained sparse encoder.
Indexing: The documents and their sparse embeddings are indexed in the vector database.
Encode the query: User queries are encoded with the same sparse encoder.
Retrieval: The vector database performs a similarity search to find the most relevant documents.
Results: Search results are returned with their similarity scores and document content.

The advantages of Sparse Vectors for search are:

Efficiency: Sparse vectors (where most values are zero) can be stored and searched more efficiently than dense vectors.
Interpretability: Non-zero dimensions in sparse embeddings often correspond to specific tokens, allowing you to understand which tokens contributed to the similarity score.
Exact Matching: Sparse vectors can preserve exact term matching signals that might be lost in dense embeddings.

Qdrant Integration

This example demonstrates how to set up Qdrant for sparse vector search by showing how to efficiently encode and index documents with sparse encoders, formulating search queries with sparse vectors, and providing an interactive query interface. See semantic_search_qdrant.py or below:

Prerequisites:

Qdrant running locally (or accessible), see the Qdrant Quickstart for more details.
The Qdrant Python client must be installed:
pip install qdrant-client

import time

from datasets import load_dataset from sentence_transformers import SparseEncoder from sentence_transformers.sparse_encoder.search_engines import semantic_search_qdrant

1. Load the natural-questions dataset with 100K answers

dataset = load_dataset("sentence-transformers/natural-questions", split="train") num_docs = 10_000 corpus = dataset["answer"][:num_docs]

2. Come up with some queries

queries = dataset["query"][:2]

3. Load the model

sparse_model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

4. Encode the corpus

corpus_embeddings = sparse_model.encode_document( corpus, convert_to_sparse_tensor=True, batch_size=16, show_progress_bar=True )

Initially, we don't have a qdrant index yet

corpus_index = None while True: # 5. Encode the queries using the full precision start_time = time.time() query_embeddings = sparse_model.encode_query(queries, convert_to_sparse_tensor=True) print(f"Encoding time: {time.time() - start_time:.6f} seconds")

# 6. Perform semantic search using qdrant
results, search_time, corpus_index = semantic_search_qdrant(
    query_embeddings,
    corpus_index=corpus_index,
    corpus_embeddings=corpus_embeddings if corpus_index is None else None,
    top_k=5,
    output_index=True,
)

# 7. Output the results
print(f"Search time: {search_time:.6f} seconds")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    for entry in result:
        print(f"(Score: {entry['score']:.4f}) {corpus[entry['corpus_id']]}, corpus_id: {entry['corpus_id']}")
    print("")

# 8. Prompt for more queries
queries = [input("Please enter a question: ")]

OpenSearch Integration

This example demonstrates how to set up OpenSearch for sparse vector search by showing how to efficiently encode and index documents with sparse encoders, formulating search queries with sparse vectors, and providing an interactive query interface. See semantic_search_opensearch.py or below:

Prerequisites:

OpenSearch running locally (or accessible), see OpenSearch locally for more details.
Further, the OpenSearch Python client must be installed: https://docs.opensearch.org/docs/latest/clients/python-low-level/, e.g.:
pip install opensearch-py
This script was created for opensearch v2.15.0+.

import time

from datasets import load_dataset

from sentence_transformers import SparseEncoder from sentence_transformers.sparse_encoder.modules import Router, SparseStaticEmbedding, SpladePooling, Transformer from sentence_transformers.sparse_encoder.search_engines import semantic_search_opensearch

1. Load the natural-questions dataset with 100K answers

dataset = load_dataset("sentence-transformers/natural-questions", split="train") num_docs = 10_000 corpus = dataset["answer"][:num_docs] print(f"Finish loading data. Corpus size: {len(corpus)}")

2. Come up with some queries

queries = dataset["query"][:2]

3. Load the model

model_id = "opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill" doc_encoder = Transformer(model_id, transformer_task="fill-mask") router = Router.for_query_document( query_modules=[ SparseStaticEmbedding.from_json( model_id, tokenizer=doc_encoder.tokenizer, frozen=True, ), ], document_modules=[ doc_encoder, SpladePooling("max", activation_function="log1p_relu"), ], )

sparse_model = SparseEncoder(modules=[router], similarity_fn_name="dot")

print("Start encoding corpus...") start_time = time.time()

4. Encode the corpus

corpus_embeddings = sparse_model.encode_document( corpus, convert_to_sparse_tensor=True, batch_size=32, show_progress_bar=True ) corpus_embeddings_decoded = sparse_model.decode(corpus_embeddings) print(f"Corpus encoding time: {time.time() - start_time:.6f} seconds")

corpus_index = None while True: # 5. Encode the queries using inference-free mode start_time = time.time() query_embeddings = sparse_model.encode_query(queries, convert_to_sparse_tensor=True) query_embeddings_decoded = sparse_model.decode(query_embeddings) print(f"Query encoding time: {time.time() - start_time:.6f} seconds")

# 6. Perform semantic search using OpenSearch
results, search_time, corpus_index = semantic_search_opensearch(
    query_embeddings_decoded,
    corpus_embeddings_decoded=corpus_embeddings_decoded if corpus_index is None else None,
    corpus_index=corpus_index,
    top_k=5,
    output_index=True,
)

# 7. Output the results
print(f"Search time: {search_time:.6f} seconds")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    for entry in result:
        print(f"(Score: {entry['score']:.4f}) {corpus[entry['corpus_id']]}, corpus_id: {entry['corpus_id']}")
    print("")

# 8. Prompt for more queries
queries = [input("Please enter a question: ")]

Elasticsearch Integration

This example demonstrates how to set up Elasticsearch for sparse vector search by showing how to efficiently encode and index documents with sparse encoders, formulating search queries with sparse vectors, and providing an interactive query interface. See semantic_search_elasticsearch.py or below:

Prerequisites:

Elasticsearch running locally (or accessible), see Elasticsearch locally for more details.
The Elasticsearch Python client must be installed:
pip install elasticsearch

import time

from datasets import load_dataset

from sentence_transformers import SparseEncoder from sentence_transformers.sparse_encoder.search_engines import semantic_search_elasticsearch

1. Load the natural-questions dataset with 100K answers

dataset = load_dataset("sentence-transformers/natural-questions", split="train") num_docs = 10_000 corpus = dataset["answer"][:num_docs]

2. Come up with some queries

queries = dataset["query"][:2]

3. Load the model

sparse_model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

4. Encode the corpus

print("Start encoding corpus...") start_time = time.time() corpus_embeddings = sparse_model.encode_document( corpus, convert_to_sparse_tensor=True, batch_size=16, show_progress_bar=True ) corpus_embeddings_decoded = sparse_model.decode(corpus_embeddings) print(f"Corpus encoding time: {time.time() - start_time:.6f} seconds")

corpus_index = None while True: # 5. Encode the queries using the full precision start_time = time.time() query_embeddings = sparse_model.encode_query(queries, convert_to_sparse_tensor=True) query_embeddings_decoded = sparse_model.decode(query_embeddings) print(f"Encoding time: {time.time() - start_time:.6f} seconds")

# 6. Perform semantic search using Elasticsearch
results, search_time, corpus_index = semantic_search_elasticsearch(
    query_embeddings_decoded,
    corpus_embeddings_decoded=corpus_embeddings_decoded if corpus_index is None else None,
    corpus_index=corpus_index,
    top_k=5,
    output_index=True,
)

# 7. Output the results
print(f"Search time: {search_time:.6f} seconds")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    for entry in result:
        print(f"(Score: {entry['score']:.4f}) {corpus[entry['corpus_id']]}, corpus_id: {entry['corpus_id']}")
    print("")

# 8. Prompt for more queries
queries = [input("Please enter a question: ")]

Seismic Integration

This example demonstrates how to use Seismic for extremely performant sparse vector search. It does not require running a separate client, but instead performs search directly in memory. The Seismic library was introduced in Bruch et al. (2024), where it’s shown to outperform the common inverted file (IVF) approach by an order of magnitude. For more information on building your Seismic Index you can look at the Seismic Guidelines. See semantic_search_seismic.py or below:

Prerequisites:

The Seismic Python package must be installed:
pip install pyseismic-lsr

import time

from datasets import load_dataset

from sentence_transformers import SparseEncoder from sentence_transformers.sparse_encoder.search_engines import semantic_search_seismic

1. Load the natural-questions dataset with 100K answers

dataset = load_dataset("sentence-transformers/natural-questions", split="train") num_docs = 10_000 corpus = dataset["answer"][:num_docs]

2. Come up with some queries

queries = dataset["query"][:2]

3. Load the model

sparse_model = SparseEncoder("naver/splade-cocondenser-ensembledistil")

4. Encode the corpus

corpus_index = None while True: # 5. Encode the queries using the full precision start_time = time.time() query_embeddings = sparse_model.encode_query(queries, convert_to_sparse_tensor=True) query_embeddings_decoded = sparse_model.decode(query_embeddings) print(f"Encoding time: {time.time() - start_time:.6f} seconds")

# 6. Perform semantic search using Seismic
results, search_time, corpus_index = semantic_search_seismic(
    query_embeddings_decoded,
    corpus_embeddings_decoded=corpus_embeddings_decoded if corpus_index is None else None,
    corpus_index=corpus_index,
    top_k=5,
    output_index=True,
)

# 7. Output the results
print(f"Search time: {search_time:.6f} seconds")
for query, result in zip(queries, results):
    print(f"Query: {query}")
    for entry in result:
        print(f"(Score: {entry['score']:.4f}) {corpus[entry['corpus_id']]}, corpus_id: {entry['corpus_id']}")
    print("")

# 8. Prompt for more queries
queries = [input("Please enter a question: ")]

SPLADE-index Integration

This example demonstrates how to use splade-index for very fast sparse vector search powered by SciPy sparse matrices, built on top of the excellent bm25s, a fast BM25 implementation. It does not require running a separate client, but instead performs search directly in memory. See semantic_search_splade_index.py or below:

Prerequisites:

The SPLADE-index Python package must be installed:

import time

from datasets import load_dataset from splade_index import SPLADE

from sentence_transformers import SparseEncoder

1. Load the natural-questions dataset with 100K answers

dataset = load_dataset("sentence-transformers/natural-questions", split="train") num_docs = 10_000 corpus = dataset["answer"][:num_docs]

2. Come up with some queries

queries = dataset["query"][:2]

3. Load the model

sparse_model = SparseEncoder("rasyosef/splade-tiny")

4. Encode the corpus & create the index

print("Start encoding corpus and creating index...") start_time = time.time() corpus_index = SPLADE() corpus_index.index(model=sparse_model, documents=corpus, batch_size=16, show_progress=True) print(f"Encoded corpus and created index in {time.time() - start_time:.6f} seconds")

while True: # 5. Encode the queries using the full precision start_time = time.time() all_doc_ids, all_documents, all_scores = corpus_index.retrieve(queries, k=5) print(f"Encoding & Search time: {time.time() - start_time:.6f} seconds")

# 7. Output the results
for query, doc_ids, documents, scores in zip(queries, all_doc_ids, all_documents, all_scores):
    print(f"Query: {query}")
    for doc_id, document, score in zip(doc_ids, documents, scores):
        print(f"(Score: {score:.4f}) {document}, corpus_id: {doc_id}")
    print("")

# 8. Prompt for more queries
queries = [input("Please enter a question: ")]