Pyserini Reproductions (original) (raw)

This page provides two-click reproductions* for a number of experimental runs on the MIRACL dataset. Instructions for programmatic execution are shown at the bottom of this page. The dataset is described in the following paper:

Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, and Jimmy Lin. MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages. Transactions of the Association for Computational Linguistics, 11:1114–1131, 2023.

nDCG@10, dev queries ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo avg
BM25 0.481 0.508 0.351 0.319 0.333 0.551 0.183 0.458 0.449 0.369 0.419 0.334 0.383 0.494 0.484 0.180 0.226 0.406 0.385
ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language ar \ --topics miracl-v1.0-ar-dev \ --index miracl-v1.0-ar \ --output run.miracl.bm25.ar.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.bm25.ar.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language bn \ --topics miracl-v1.0-bn-dev \ --index miracl-v1.0-bn \ --output run.miracl.bm25.bn.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.bm25.bn.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language en \ --topics miracl-v1.0-en-dev \ --index miracl-v1.0-en \ --output run.miracl.bm25.en.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.bm25.en.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language es \ --topics miracl-v1.0-es-dev \ --index miracl-v1.0-es \ --output run.miracl.bm25.es.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.bm25.es.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language fa \ --topics miracl-v1.0-fa-dev \ --index miracl-v1.0-fa \ --output run.miracl.bm25.fa.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.bm25.fa.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language fi \ --topics miracl-v1.0-fi-dev \ --index miracl-v1.0-fi \ --output run.miracl.bm25.fi.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.bm25.fi.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language fr \ --topics miracl-v1.0-fr-dev \ --index miracl-v1.0-fr \ --output run.miracl.bm25.fr.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.bm25.fr.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language hi \ --topics miracl-v1.0-hi-dev \ --index miracl-v1.0-hi \ --output run.miracl.bm25.hi.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.bm25.hi.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language id \ --topics miracl-v1.0-id-dev \ --index miracl-v1.0-id \ --output run.miracl.bm25.id.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.bm25.id.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language ja \ --topics miracl-v1.0-ja-dev \ --index miracl-v1.0-ja \ --output run.miracl.bm25.ja.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.bm25.ja.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language ko \ --topics miracl-v1.0-ko-dev \ --index miracl-v1.0-ko \ --output run.miracl.bm25.ko.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.bm25.ko.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language ru \ --topics miracl-v1.0-ru-dev \ --index miracl-v1.0-ru \ --output run.miracl.bm25.ru.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.bm25.ru.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language sw \ --topics miracl-v1.0-sw-dev \ --index miracl-v1.0-sw \ --output run.miracl.bm25.sw.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.bm25.sw.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language te \ --topics miracl-v1.0-te-dev \ --index miracl-v1.0-te \ --output run.miracl.bm25.te.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.bm25.te.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language th \ --topics miracl-v1.0-th-dev \ --index miracl-v1.0-th \ --output run.miracl.bm25.th.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.bm25.th.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language zh \ --topics miracl-v1.0-zh-dev \ --index miracl-v1.0-zh \ --output run.miracl.bm25.zh.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.bm25.zh.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language de \ --topics miracl-v1.0-de-dev \ --index miracl-v1.0-de \ --output run.miracl.bm25.de.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \ run.miracl.bm25.de.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 --pretokenized \ --topics miracl-v1.0-yo-dev \ --index miracl-v1.0-yo \ --output run.miracl.bm25.yo.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \ run.miracl.bm25.yo.dev.txt
mDPR pFT 0.499 0.443 0.394 0.478 0.480 0.472 0.435 0.383 0.272 0.439 0.419 0.407 0.299 0.356 0.358 0.512 0.490 0.444 0.421
ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-ar-dev \ --index miracl-v1.0-ar-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.ar.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.mdpr-tied-pft-msmarco.ar.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-bn-dev \ --index miracl-v1.0-bn-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.bn.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.mdpr-tied-pft-msmarco.bn.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-en-dev \ --index miracl-v1.0-en-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.en.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.mdpr-tied-pft-msmarco.en.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-es-dev \ --index miracl-v1.0-es-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.es.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.mdpr-tied-pft-msmarco.es.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-fa-dev \ --index miracl-v1.0-fa-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.fa.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.mdpr-tied-pft-msmarco.fa.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-fi-dev \ --index miracl-v1.0-fi-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.fi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.mdpr-tied-pft-msmarco.fi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-fr-dev \ --index miracl-v1.0-fr-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.fr.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.mdpr-tied-pft-msmarco.fr.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-hi-dev \ --index miracl-v1.0-hi-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.hi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.mdpr-tied-pft-msmarco.hi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-id-dev \ --index miracl-v1.0-id-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.id.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.mdpr-tied-pft-msmarco.id.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-ja-dev \ --index miracl-v1.0-ja-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.ja.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.mdpr-tied-pft-msmarco.ja.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-ko-dev \ --index miracl-v1.0-ko-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.ko.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.mdpr-tied-pft-msmarco.ko.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-ru-dev \ --index miracl-v1.0-ru-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.ru.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.mdpr-tied-pft-msmarco.ru.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-sw-dev \ --index miracl-v1.0-sw-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.sw.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.mdpr-tied-pft-msmarco.sw.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-te-dev \ --index miracl-v1.0-te-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.te.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.mdpr-tied-pft-msmarco.te.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-th-dev \ --index miracl-v1.0-th-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.th.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.mdpr-tied-pft-msmarco.th.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-zh-dev \ --index miracl-v1.0-zh-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.zh.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.mdpr-tied-pft-msmarco.zh.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-de-dev \ --index miracl-v1.0-de-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.de.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \ run.miracl.mdpr-tied-pft-msmarco.de.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-yo-dev \ --index miracl-v1.0-yo-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.yo.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \ run.miracl.mdpr-tied-pft-msmarco.yo.dev.txt
BM25+mDPR pFT 0.673 0.654 0.549 0.641 0.594 0.672 0.523 0.616 0.443 0.576 0.609 0.532 0.446 0.602 0.599 0.525 0.564 0.611 0.579
ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.ar.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ar.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.bn.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.bn.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.en.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.en.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.es.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.es.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.fa.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fa.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.fi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fi.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.fr.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fr.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.hi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.hi.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.id.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.id.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.ja.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ja.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.ko.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ko.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.ru.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ru.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.sw.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.sw.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.te.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.te.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.th.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.th.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.zh.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.zh.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.de.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.de.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.yo.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.yo.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt
mDPR pFT+FT1 0.578 0.580 0.281 0.251 0.384 0.569 0.301 0.329 0.346 0.500 0.486 0.393 0.658 0.778 0.598 0.358 0.322 0.598 0.462
ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-ar-dev \ --index miracl-v1.0-ar-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-bn-dev \ --index miracl-v1.0-bn-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-en-dev \ --index miracl-v1.0-en-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-es-dev \ --index miracl-v1.0-es-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-fa-dev \ --index miracl-v1.0-fa-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-fi-dev \ --index miracl-v1.0-fi-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-fr-dev \ --index miracl-v1.0-fr-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-hi-dev \ --index miracl-v1.0-hi-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-id-dev \ --index miracl-v1.0-id-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-ja-dev \ --index miracl-v1.0-ja-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-ko-dev \ --index miracl-v1.0-ko-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-ru-dev \ --index miracl-v1.0-ru-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-sw-dev \ --index miracl-v1.0-sw-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-te-dev \ --index miracl-v1.0-te-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-th-dev \ --index miracl-v1.0-th-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-zh-dev \ --index miracl-v1.0-zh-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-de-dev \ --index miracl-v1.0-de-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-yo-dev \ --index miracl-v1.0-yo-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt
mDPR pFT+FT2 0.725 0.684 0.488 0.565 0.593 0.714 0.589 0.516 0.496 0.642 0.590 0.597 0.685 0.804 0.695 0.650 -- -- 0.627
ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-ar \ --topics miracl-v1.0-ar-dev \ --index miracl-v1.0-ar-mdpr-tied-pft-msmarco-ft-miracl-ar \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ar.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ar.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-bn \ --topics miracl-v1.0-bn-dev \ --index miracl-v1.0-bn-mdpr-tied-pft-msmarco-ft-miracl-bn \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.bn.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.bn.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-en \ --topics miracl-v1.0-en-dev \ --index miracl-v1.0-en-mdpr-tied-pft-msmarco-ft-miracl-en \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.en.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.en.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-es \ --topics miracl-v1.0-es-dev \ --index miracl-v1.0-es-mdpr-tied-pft-msmarco-ft-miracl-es \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.es.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.es.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-fa \ --topics miracl-v1.0-fa-dev \ --index miracl-v1.0-fa-mdpr-tied-pft-msmarco-ft-miracl-fa \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fa.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fa.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-fi \ --topics miracl-v1.0-fi-dev \ --index miracl-v1.0-fi-mdpr-tied-pft-msmarco-ft-miracl-fi \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-fr \ --topics miracl-v1.0-fr-dev \ --index miracl-v1.0-fr-mdpr-tied-pft-msmarco-ft-miracl-fr \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fr.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fr.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-hi \ --topics miracl-v1.0-hi-dev \ --index miracl-v1.0-hi-mdpr-tied-pft-msmarco-ft-miracl-hi \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.hi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.hi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-id \ --topics miracl-v1.0-id-dev \ --index miracl-v1.0-id-mdpr-tied-pft-msmarco-ft-miracl-id \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.id.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.id.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-ja \ --topics miracl-v1.0-ja-dev \ --index miracl-v1.0-ja-mdpr-tied-pft-msmarco-ft-miracl-ja \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ja.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ja.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-ko \ --topics miracl-v1.0-ko-dev \ --index miracl-v1.0-ko-mdpr-tied-pft-msmarco-ft-miracl-ko \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ko.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ko.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-ru \ --topics miracl-v1.0-ru-dev \ --index miracl-v1.0-ru-mdpr-tied-pft-msmarco-ft-miracl-ru \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ru.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ru.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-sw \ --topics miracl-v1.0-sw-dev \ --index miracl-v1.0-sw-mdpr-tied-pft-msmarco-ft-miracl-sw \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.sw.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.sw.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-te \ --topics miracl-v1.0-te-dev \ --index miracl-v1.0-te-mdpr-tied-pft-msmarco-ft-miracl-te \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.te.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.te.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-th \ --topics miracl-v1.0-th-dev \ --index miracl-v1.0-th-mdpr-tied-pft-msmarco-ft-miracl-th \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.th.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.th.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-zh \ --topics miracl-v1.0-zh-dev \ --index miracl-v1.0-zh-mdpr-tied-pft-msmarco-ft-miracl-zh \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.zh.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.zh.dev.txt Command to generate run: Evaluation commands: Command to generate run: Evaluation commands:
mContriever 0.525 0.501 0.364 0.418 0.215 0.602 0.314 0.286 0.392 0.424 0.483 0.391 0.560 0.528 0.517 0.410 0.408 0.415 0.431
ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-ar-dev \ --index miracl-v1.0-ar-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.ar.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.mcontriever-tied-pft-msmarco.ar.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-bn-dev \ --index miracl-v1.0-bn-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.bn.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.mcontriever-tied-pft-msmarco.bn.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-en-dev \ --index miracl-v1.0-en-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.en.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.mcontriever-tied-pft-msmarco.en.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-es-dev \ --index miracl-v1.0-es-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.es.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.mcontriever-tied-pft-msmarco.es.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-fa-dev \ --index miracl-v1.0-fa-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.fa.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.mcontriever-tied-pft-msmarco.fa.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-fi-dev \ --index miracl-v1.0-fi-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.fi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.mcontriever-tied-pft-msmarco.fi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-fr-dev \ --index miracl-v1.0-fr-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.fr.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.mcontriever-tied-pft-msmarco.fr.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-hi-dev \ --index miracl-v1.0-hi-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.hi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.mcontriever-tied-pft-msmarco.hi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-id-dev \ --index miracl-v1.0-id-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.id.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.mcontriever-tied-pft-msmarco.id.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-ja-dev \ --index miracl-v1.0-ja-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.ja.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.mcontriever-tied-pft-msmarco.ja.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-ko-dev \ --index miracl-v1.0-ko-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.ko.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.mcontriever-tied-pft-msmarco.ko.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-ru-dev \ --index miracl-v1.0-ru-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.ru.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.mcontriever-tied-pft-msmarco.ru.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-sw-dev \ --index miracl-v1.0-sw-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.sw.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.mcontriever-tied-pft-msmarco.sw.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-te-dev \ --index miracl-v1.0-te-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.te.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.mcontriever-tied-pft-msmarco.te.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-th-dev \ --index miracl-v1.0-th-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.th.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.mcontriever-tied-pft-msmarco.th.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-zh-dev \ --index miracl-v1.0-zh-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.zh.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.mcontriever-tied-pft-msmarco.zh.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-de-dev \ --index miracl-v1.0-de-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.de.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \ run.miracl.mcontriever-tied-pft-msmarco.de.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-yo-dev \ --index miracl-v1.0-yo-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.yo.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \ run.miracl.mcontriever-tied-pft-msmarco.yo.dev.txt

Programmatic Execution

All experimental runs shown in the above table can be programmatically executed based on the instructions below. To list all the experimental conditions:

python -m pyserini.2cr.miracl --list-conditions

Run all languages for a specific condition and show commands:

python -m pyserini.2cr.miracl --condition bm25 --display-commands

Run a particular language for a specific condition and show commands:

python -m pyserini.2cr.miracl --condition bm25 --language ko --display-commands

Run all languages for all conditions and show commands:

python -m pyserini.2cr.miracl --all --display-commands

With the above commands, run files will be placed in the current directory. Use the option --directory runs to place the runs in a sub-directory.

For a specific condition, just show the commands and do not run:

python -m pyserini.2cr.miracl --condition bm25 --display-commands --dry-run

This will generate exactly the commands for a specific condition above (corresponding to a row in the table).

For a specific condition and language, just show the commands and do not run:

python -m pyserini.2cr.miracl --condition bm25 --language ko --display-commands --dry-run

For all conditions, just show the commands and do not run and skip evaluation:

python -m pyserini.2cr.miracl --all --display-commands --dry-run --skip-eval

Finally, to generate this page:

python -m pyserini.2cr.miracl --generate-report --output docs/2cr/miracl.html

The output file miracl.html should be identical to this page.