Models available (original) (raw)

🍻 Available Models

Name	Implementation
BM25 (Robertson and Zaragoza, 2009)	https://www.elastic.co/
Anserini (Yang et al., 2017)	https://github.com/castorini/anserini
SBERT (Reimers and Gurevych, 2019)	https://www.sbert.net/
ANCE (Xiong et al., 2020)	https://github.com/microsoft/ANCE
DPR (Karpukhin et al., 2020)	https://github.com/facebookresearch/DPR
USE-QA (Yang et al., 2020)	https://tfhub.dev/google/universal-sentence-encoder-qa/3
SPARTA (Zhao et al., 2020)	https://huggingface.co/BeIR
ColBERT (Khattab and Zaharia, 2020)	https://github.com/stanford-futuredata/ColBERT

How to load different models available in BEIR?

We include different retrieval architectures and evaluate them all in a zero-shot setup.

Lexical Retrieval Evaluation using BM25 (Elasticsearch)

from beir.retrieval.search.lexical import BM25Search as BM25

hostname = "your-hostname" #localhost index_name = "your-index-name" # scifact initialize = True # True, will delete existing index with same name and reindex all documents model = BM25(index_name=index_name, hostname=hostname, initialize=initialize)

Sparse Retrieval using SPARTA

from beir.retrieval.search.sparse import SparseSearch from beir.retrieval import models

model_path = "BeIR/sparta-msmarco-distilbert-base-v1" sparse_model = SparseSearch(models.SPARTA(model_path), batch_size=128)

Dense Retrieval using SBERT, ANCE, USE-QA or DPR

from beir.retrieval import models from beir.retrieval.evaluation import EvaluateRetrieval from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES

model = DRES(models.SentenceBERT("msmarco-distilbert-base-v3"), batch_size=16) retriever = EvaluateRetrieval(model, score_function="cos_sim") # or "dot" for dot-product

Reranking using Cross-Encoder model

from beir.reranking.models import CrossEncoder from beir.reranking import Rerank

cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-electra-base') reranker = Rerank(cross_encoder_model, batch_size=128)

Rerank top-100 results retrieved by BM25

rerank_results = reranker.rerank(corpus, queries, bm25_results, top_k=100)

Disclaimer

If you use any one of the implementations, please make sure to include the correct citation.

If you implemented a model and wish to update any part of it, or do not want the model to be included, feel free to post an issue here or make a pull request!

If you implemented a model and wish to include your model in this library, feel free to post an issue here or make a pull request. Otherwise, if you want to evaluate the model on your own, see the following section.