rasyosef/splade-small · Hugging Face (original) (raw)

SPLADE-Small

This is a SPLADE sparse retrieval model based on BERT-Small (29M) that was trained by distilling a Cross-Encoder on the MSMARCO dataset. The cross-encoder used was ms-marco-MiniLM-L6-v2.

This SPLADE model is 2x smaller than Naver's official splade-v3-distilbert while having 91% of it's performance on the MSMARCO benchmark. This model is small enough to be used without a GPU on a dataset of a few thousand documents.

Performance

The splade models were evaluated on 55 thousand queries and 8.84 million documents from the MSMARCO dataset.

Size (# Params) MRR@10 (MS MARCO dev)
BM25 - 18.0
rasyosef/splade-tiny 4.4M 30.9
rasyosef/splade-mini 11.2M 34.1
rasyosef/splade-small 28.8M 35.4
naver/splade-v3-distilbert 67.0M 38.7

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SparseEncoder

# Download from the 🤗 Hub
model = SparseEncoder("rasyosef/splade-small")
# Run inference
queries = [
    "how many tablespoons of garlic powder are in an ounce",
]
documents = [
    '1 Fluid Ounce (fl oz) = 2 tablespoons   16 Tablespoons = 1 cup   16 Fluid Ounce (fl oz) = 2 cup. two ! 9.7 grams of garlic powder will be present in a tablespoon. 1 dry ounce is between 2 and 2.38 tablespoons, 16 tablespoons is incorrect. --------------------- 16 tablespoons per dry ounce. It is approximately 1/2 ounce. Usually 1/4 to 1/2 tsp.',
    'Spices, garlic powder weigh(s) 164 gram per (metric cup) or 5.47 ounce per (US cup)',
    'How many teaspoons of garlic powder equal a clove of garlic? Weigh the garlic clove and then weigh the garlic powder to make sure it is the same weight. That is how much powder equals a clove of garlic.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 30522] [3, 30522]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[26.3104, 20.4381, 15.5539]])

Model Details

Model Description

Model Sources

Full Model Architecture

SparseEncoder(
  (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
  (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
)

More

Click to expand

Evaluation

Metrics

Sparse Information Retrieval

Metric Value
dot_accuracy@1 0.4547
dot_accuracy@3 0.7685
dot_accuracy@5 0.8786
dot_accuracy@10 0.9484
dot_precision@1 0.4547
dot_precision@3 0.2634
dot_precision@5 0.1828
dot_precision@10 0.0998
dot_recall@1 0.44
dot_recall@3 0.7543
dot_recall@5 0.8678
dot_recall@10 0.9424
dot_ndcg@10 0.7031
dot_mrr@10 0.6288
dot_map@100 0.6253
query_active_dims 24.9142
query_sparsity_ratio 0.9992
corpus_active_dims 171.8592
corpus_sparsity_ratio 0.9944

Training Details

Training Dataset

Unnamed Dataset

{  
    "loss": "SparseMarginMSELoss",  
    "document_regularizer_weight": 0.12,  
    "query_regularizer_weight": 0.2  
}  

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Click to expand

Training Logs

Epoch Step Training Loss dot_ndcg@10
1.0 16667 8.363 0.6961
2.0 33334 6.5021 0.7031
3.0 50001 5.2209 0.7031

Framework Versions

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

SpladeLoss

@misc{formal2022distillationhardnegativesampling,
      title={From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective},
      author={Thibault Formal and Carlos Lassance and Benjamin Piwowarski and Stéphane Clinchant},
      year={2022},
      eprint={2205.04733},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2205.04733},
}

SparseMarginMSELoss

@misc{hofstätter2021improving,
    title={Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation},
    author={Sebastian Hofstätter and Sophia Althammer and Michael Schröder and Mete Sertkan and Allan Hanbury},
    year={2021},
    eprint={2010.02666},
    archivePrefix={arXiv},
    primaryClass={cs.IR}
}

FlopsLoss

@article{paria2020minimizing,
    title={Minimizing flops to learn efficient sparse representations},
    author={Paria, Biswajit and Yeh, Chih-Kuan and Yen, Ian EH and Xu, Ning and Ravikumar, Pradeep and P{'o}czos, Barnab{'a}s},
    journal={arXiv preprint arXiv:2004.05665},
    year={2020}
}

Model tree for rasyosef/splade-small

Dataset used to train rasyosef/splade-small

microsoft/ms_marco Viewer • Updated Jan 4, 2024• 1.11M • 21.9k • 239

Collection including rasyosef/splade-small

Papers for rasyosef/splade-small

Evaluation results