Models available (original) (raw)
🍻 Available Models
| Name | Implementation |
|---|---|
| BM25 (Robertson and Zaragoza, 2009) | https://www.elastic.co/ |
| Anserini (Yang et al., 2017) | https://github.com/castorini/anserini |
| SBERT (Reimers and Gurevych, 2019) | https://www.sbert.net/ |
| ANCE (Xiong et al., 2020) | https://github.com/microsoft/ANCE |
| DPR (Karpukhin et al., 2020) | https://github.com/facebookresearch/DPR |
| USE-QA (Yang et al., 2020) | https://tfhub.dev/google/universal-sentence-encoder-qa/3 |
| SPARTA (Zhao et al., 2020) | https://huggingface.co/BeIR |
| ColBERT (Khattab and Zaharia, 2020) | https://github.com/stanford-futuredata/ColBERT |
How to load different models available in BEIR?
We include different retrieval architectures and evaluate them all in a zero-shot setup.
Lexical Retrieval Evaluation using BM25 (Elasticsearch)
from beir.retrieval.search.lexical import BM25Search as BM25
hostname = "your-hostname" #localhost index_name = "your-index-name" # scifact initialize = True # True, will delete existing index with same name and reindex all documents model = BM25(index_name=index_name, hostname=hostname, initialize=initialize)
Sparse Retrieval using SPARTA
from beir.retrieval.search.sparse import SparseSearch from beir.retrieval import models
model_path = "BeIR/sparta-msmarco-distilbert-base-v1" sparse_model = SparseSearch(models.SPARTA(model_path), batch_size=128)
Dense Retrieval using SBERT, ANCE, USE-QA or DPR
from beir.retrieval import models from beir.retrieval.evaluation import EvaluateRetrieval from beir.retrieval.search.dense import DenseRetrievalExactSearch as DRES
model = DRES(models.SentenceBERT("msmarco-distilbert-base-v3"), batch_size=16) retriever = EvaluateRetrieval(model, score_function="cos_sim") # or "dot" for dot-product
Reranking using Cross-Encoder model
from beir.reranking.models import CrossEncoder from beir.reranking import Rerank
cross_encoder_model = CrossEncoder('cross-encoder/ms-marco-electra-base') reranker = Rerank(cross_encoder_model, batch_size=128)
Rerank top-100 results retrieved by BM25
rerank_results = reranker.rerank(corpus, queries, bm25_results, top_k=100)
Disclaimer
If you use any one of the implementations, please make sure to include the correct citation.
If you implemented a model and wish to update any part of it, or do not want the model to be included, feel free to post an issue here or make a pull request!
If you implemented a model and wish to include your model in this library, feel free to post an issue here or make a pull request. Otherwise, if you want to evaluate the model on your own, see the following section.