Pyserini Reproductions (original) (raw)
This page provides two-click reproductions* for a number of experimental runs on the MIRACL dataset. Instructions for programmatic execution are shown at the bottom of this page. The dataset is described in the following paper:
Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, and Jimmy Lin. MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages. Transactions of the Association for Computational Linguistics, 11:1114–1131, 2023.
| nDCG@10, dev queries | ar | bn | en | es | fa | fi | fr | hi | id | ja | ko | ru | sw | te | th | zh | de | yo | avg |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BM25 | 0.481 | 0.508 | 0.351 | 0.319 | 0.333 | 0.551 | 0.183 | 0.458 | 0.449 | 0.369 | 0.419 | 0.334 | 0.383 | 0.494 | 0.484 | 0.180 | 0.226 | 0.406 | 0.385 |
| ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language ar \ --topics miracl-v1.0-ar-dev \ --index miracl-v1.0-ar \ --output run.miracl.bm25.ar.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.bm25.ar.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language bn \ --topics miracl-v1.0-bn-dev \ --index miracl-v1.0-bn \ --output run.miracl.bm25.bn.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.bm25.bn.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language en \ --topics miracl-v1.0-en-dev \ --index miracl-v1.0-en \ --output run.miracl.bm25.en.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.bm25.en.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language es \ --topics miracl-v1.0-es-dev \ --index miracl-v1.0-es \ --output run.miracl.bm25.es.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.bm25.es.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language fa \ --topics miracl-v1.0-fa-dev \ --index miracl-v1.0-fa \ --output run.miracl.bm25.fa.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.bm25.fa.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language fi \ --topics miracl-v1.0-fi-dev \ --index miracl-v1.0-fi \ --output run.miracl.bm25.fi.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.bm25.fi.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language fr \ --topics miracl-v1.0-fr-dev \ --index miracl-v1.0-fr \ --output run.miracl.bm25.fr.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.bm25.fr.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language hi \ --topics miracl-v1.0-hi-dev \ --index miracl-v1.0-hi \ --output run.miracl.bm25.hi.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.bm25.hi.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language id \ --topics miracl-v1.0-id-dev \ --index miracl-v1.0-id \ --output run.miracl.bm25.id.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.bm25.id.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language ja \ --topics miracl-v1.0-ja-dev \ --index miracl-v1.0-ja \ --output run.miracl.bm25.ja.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.bm25.ja.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language ko \ --topics miracl-v1.0-ko-dev \ --index miracl-v1.0-ko \ --output run.miracl.bm25.ko.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.bm25.ko.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language ru \ --topics miracl-v1.0-ru-dev \ --index miracl-v1.0-ru \ --output run.miracl.bm25.ru.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.bm25.ru.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language sw \ --topics miracl-v1.0-sw-dev \ --index miracl-v1.0-sw \ --output run.miracl.bm25.sw.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.bm25.sw.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language te \ --topics miracl-v1.0-te-dev \ --index miracl-v1.0-te \ --output run.miracl.bm25.te.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.bm25.te.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language th \ --topics miracl-v1.0-th-dev \ --index miracl-v1.0-th \ --output run.miracl.bm25.th.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.bm25.th.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language zh \ --topics miracl-v1.0-zh-dev \ --index miracl-v1.0-zh \ --output run.miracl.bm25.zh.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.bm25.zh.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 \ --language de \ --topics miracl-v1.0-de-dev \ --index miracl-v1.0-de \ --output run.miracl.bm25.de.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \ run.miracl.bm25.de.dev.txt Command to generate run: python -m pyserini.search.lucene \ --threads 16 --batch-size 128 --pretokenized \ --topics miracl-v1.0-yo-dev \ --index miracl-v1.0-yo \ --output run.miracl.bm25.yo.dev.txt \ --bm25 --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \ run.miracl.bm25.yo.dev.txt | |||||||||||||||||||
| mDPR pFT | 0.499 | 0.443 | 0.394 | 0.478 | 0.480 | 0.472 | 0.435 | 0.383 | 0.272 | 0.439 | 0.419 | 0.407 | 0.299 | 0.356 | 0.358 | 0.512 | 0.490 | 0.444 | 0.421 |
| ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-ar-dev \ --index miracl-v1.0-ar-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.ar.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.mdpr-tied-pft-msmarco.ar.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-bn-dev \ --index miracl-v1.0-bn-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.bn.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.mdpr-tied-pft-msmarco.bn.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-en-dev \ --index miracl-v1.0-en-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.en.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.mdpr-tied-pft-msmarco.en.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-es-dev \ --index miracl-v1.0-es-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.es.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.mdpr-tied-pft-msmarco.es.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-fa-dev \ --index miracl-v1.0-fa-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.fa.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.mdpr-tied-pft-msmarco.fa.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-fi-dev \ --index miracl-v1.0-fi-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.fi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.mdpr-tied-pft-msmarco.fi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-fr-dev \ --index miracl-v1.0-fr-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.fr.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.mdpr-tied-pft-msmarco.fr.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-hi-dev \ --index miracl-v1.0-hi-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.hi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.mdpr-tied-pft-msmarco.hi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-id-dev \ --index miracl-v1.0-id-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.id.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.mdpr-tied-pft-msmarco.id.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-ja-dev \ --index miracl-v1.0-ja-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.ja.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.mdpr-tied-pft-msmarco.ja.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-ko-dev \ --index miracl-v1.0-ko-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.ko.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.mdpr-tied-pft-msmarco.ko.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-ru-dev \ --index miracl-v1.0-ru-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.ru.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.mdpr-tied-pft-msmarco.ru.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-sw-dev \ --index miracl-v1.0-sw-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.sw.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.mdpr-tied-pft-msmarco.sw.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-te-dev \ --index miracl-v1.0-te-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.te.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.mdpr-tied-pft-msmarco.te.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-th-dev \ --index miracl-v1.0-th-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.th.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.mdpr-tied-pft-msmarco.th.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-zh-dev \ --index miracl-v1.0-zh-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.zh.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.mdpr-tied-pft-msmarco.zh.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-de-dev \ --index miracl-v1.0-de-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.de.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \ run.miracl.mdpr-tied-pft-msmarco.de.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco \ --topics miracl-v1.0-yo-dev \ --index miracl-v1.0-yo-mdpr-tied-pft-msmarco \ --output run.miracl.mdpr-tied-pft-msmarco.yo.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \ run.miracl.mdpr-tied-pft-msmarco.yo.dev.txt | |||||||||||||||||||
| BM25+mDPR pFT | 0.673 | 0.654 | 0.549 | 0.641 | 0.594 | 0.672 | 0.523 | 0.616 | 0.443 | 0.576 | 0.609 | 0.532 | 0.446 | 0.602 | 0.599 | 0.525 | 0.564 | 0.611 | 0.579 |
| ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.ar.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ar.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ar.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.bn.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.bn.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.bn.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.en.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.en.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.en.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.es.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.es.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.es.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.fa.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fa.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fa.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.fi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fi.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fi.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.fr.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.fr.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.fr.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.hi.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.hi.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.hi.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.id.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.id.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.id.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.ja.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ja.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ja.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.ko.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ko.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ko.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.ru.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.ru.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.ru.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.sw.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.sw.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.sw.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.te.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.te.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.te.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.th.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.th.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.th.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.zh.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.zh.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.zh.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.de.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.de.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.de.dev.txt Command to generate run: python -m pyserini.fusion \ --runs run.miracl.bm25.yo.dev.top1000.txt run.miracl.mdpr-tied-pft-msmarco.yo.dev.top1000.txt \ --output run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt --method interpolation --alpha 0.5 --depth 1000 --k 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \ run.miracl.bm25-mdpr-tied-pft-msmarco-hybrid.yo.dev.txt | |||||||||||||||||||
| mDPR pFT+FT1 | 0.578 | 0.580 | 0.281 | 0.251 | 0.384 | 0.569 | 0.301 | 0.329 | 0.346 | 0.500 | 0.486 | 0.393 | 0.658 | 0.778 | 0.598 | 0.358 | 0.322 | 0.598 | 0.462 |
| ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-ar-dev \ --index miracl-v1.0-ar-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.ar.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-bn-dev \ --index miracl-v1.0-bn-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.bn.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-en-dev \ --index miracl-v1.0-en-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.en.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-es-dev \ --index miracl-v1.0-es-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.es.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-fa-dev \ --index miracl-v1.0-fa-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.fa.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-fi-dev \ --index miracl-v1.0-fi-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.fi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-fr-dev \ --index miracl-v1.0-fr-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.fr.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-hi-dev \ --index miracl-v1.0-hi-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.hi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-id-dev \ --index miracl-v1.0-id-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.id.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-ja-dev \ --index miracl-v1.0-ja-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.ja.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-ko-dev \ --index miracl-v1.0-ko-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.ko.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-ru-dev \ --index miracl-v1.0-ru-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.ru.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-sw-dev \ --index miracl-v1.0-sw-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.sw.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-te-dev \ --index miracl-v1.0-te-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.te.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-th-dev \ --index miracl-v1.0-th-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.th.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-zh-dev \ --index miracl-v1.0-zh-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.zh.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-de-dev \ --index miracl-v1.0-de-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.de.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-all \ --topics miracl-v1.0-yo-dev \ --index miracl-v1.0-yo-mdpr-tied-pft-msmarco-ft-all \ --output run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-all.yo.dev.txt | |||||||||||||||||||
| mDPR pFT+FT2 | 0.725 | 0.684 | 0.488 | 0.565 | 0.593 | 0.714 | 0.589 | 0.516 | 0.496 | 0.642 | 0.590 | 0.597 | 0.685 | 0.804 | 0.695 | 0.650 | -- | -- | 0.627 |
| ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-ar \ --topics miracl-v1.0-ar-dev \ --index miracl-v1.0-ar-mdpr-tied-pft-msmarco-ft-miracl-ar \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ar.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ar.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-bn \ --topics miracl-v1.0-bn-dev \ --index miracl-v1.0-bn-mdpr-tied-pft-msmarco-ft-miracl-bn \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.bn.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.bn.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-en \ --topics miracl-v1.0-en-dev \ --index miracl-v1.0-en-mdpr-tied-pft-msmarco-ft-miracl-en \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.en.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.en.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-es \ --topics miracl-v1.0-es-dev \ --index miracl-v1.0-es-mdpr-tied-pft-msmarco-ft-miracl-es \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.es.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.es.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-fa \ --topics miracl-v1.0-fa-dev \ --index miracl-v1.0-fa-mdpr-tied-pft-msmarco-ft-miracl-fa \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fa.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fa.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-fi \ --topics miracl-v1.0-fi-dev \ --index miracl-v1.0-fi-mdpr-tied-pft-msmarco-ft-miracl-fi \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-fr \ --topics miracl-v1.0-fr-dev \ --index miracl-v1.0-fr-mdpr-tied-pft-msmarco-ft-miracl-fr \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fr.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.fr.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-hi \ --topics miracl-v1.0-hi-dev \ --index miracl-v1.0-hi-mdpr-tied-pft-msmarco-ft-miracl-hi \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.hi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.hi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-id \ --topics miracl-v1.0-id-dev \ --index miracl-v1.0-id-mdpr-tied-pft-msmarco-ft-miracl-id \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.id.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.id.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-ja \ --topics miracl-v1.0-ja-dev \ --index miracl-v1.0-ja-mdpr-tied-pft-msmarco-ft-miracl-ja \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ja.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ja.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-ko \ --topics miracl-v1.0-ko-dev \ --index miracl-v1.0-ko-mdpr-tied-pft-msmarco-ft-miracl-ko \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ko.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ko.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-ru \ --topics miracl-v1.0-ru-dev \ --index miracl-v1.0-ru-mdpr-tied-pft-msmarco-ft-miracl-ru \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ru.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.ru.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-sw \ --topics miracl-v1.0-sw-dev \ --index miracl-v1.0-sw-mdpr-tied-pft-msmarco-ft-miracl-sw \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.sw.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.sw.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-te \ --topics miracl-v1.0-te-dev \ --index miracl-v1.0-te-mdpr-tied-pft-msmarco-ft-miracl-te \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.te.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.te.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-th \ --topics miracl-v1.0-th-dev \ --index miracl-v1.0-th-mdpr-tied-pft-msmarco-ft-miracl-th \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.th.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.th.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class auto \ --encoder castorini/mdpr-tied-pft-msmarco-ft-miracl-zh \ --topics miracl-v1.0-zh-dev \ --index miracl-v1.0-zh-mdpr-tied-pft-msmarco-ft-miracl-zh \ --output run.miracl.mdpr-tied-pft-msmarco-ft-miracl.zh.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.mdpr-tied-pft-msmarco-ft-miracl.zh.dev.txt Command to generate run: Evaluation commands: Command to generate run: Evaluation commands: | |||||||||||||||||||
| mContriever | 0.525 | 0.501 | 0.364 | 0.418 | 0.215 | 0.602 | 0.314 | 0.286 | 0.392 | 0.424 | 0.483 | 0.391 | 0.560 | 0.528 | 0.517 | 0.410 | 0.408 | 0.415 | 0.431 |
| ar bn en es fa fi fr hi id ja ko ru sw te th zh de yo Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-ar-dev \ --index miracl-v1.0-ar-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.ar.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ar-dev \ run.miracl.mcontriever-tied-pft-msmarco.ar.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-bn-dev \ --index miracl-v1.0-bn-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.bn.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-bn-dev \ run.miracl.mcontriever-tied-pft-msmarco.bn.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-en-dev \ --index miracl-v1.0-en-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.en.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-en-dev \ run.miracl.mcontriever-tied-pft-msmarco.en.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-es-dev \ --index miracl-v1.0-es-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.es.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-es-dev \ run.miracl.mcontriever-tied-pft-msmarco.es.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-fa-dev \ --index miracl-v1.0-fa-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.fa.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fa-dev \ run.miracl.mcontriever-tied-pft-msmarco.fa.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-fi-dev \ --index miracl-v1.0-fi-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.fi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fi-dev \ run.miracl.mcontriever-tied-pft-msmarco.fi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-fr-dev \ --index miracl-v1.0-fr-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.fr.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-fr-dev \ run.miracl.mcontriever-tied-pft-msmarco.fr.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-hi-dev \ --index miracl-v1.0-hi-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.hi.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-hi-dev \ run.miracl.mcontriever-tied-pft-msmarco.hi.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-id-dev \ --index miracl-v1.0-id-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.id.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-id-dev \ run.miracl.mcontriever-tied-pft-msmarco.id.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-ja-dev \ --index miracl-v1.0-ja-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.ja.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ja-dev \ run.miracl.mcontriever-tied-pft-msmarco.ja.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-ko-dev \ --index miracl-v1.0-ko-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.ko.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ko-dev \ run.miracl.mcontriever-tied-pft-msmarco.ko.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-ru-dev \ --index miracl-v1.0-ru-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.ru.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-ru-dev \ run.miracl.mcontriever-tied-pft-msmarco.ru.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-sw-dev \ --index miracl-v1.0-sw-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.sw.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-sw-dev \ run.miracl.mcontriever-tied-pft-msmarco.sw.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-te-dev \ --index miracl-v1.0-te-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.te.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-te-dev \ run.miracl.mcontriever-tied-pft-msmarco.te.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-th-dev \ --index miracl-v1.0-th-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.th.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-th-dev \ run.miracl.mcontriever-tied-pft-msmarco.th.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-zh-dev \ --index miracl-v1.0-zh-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.zh.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-zh-dev \ run.miracl.mcontriever-tied-pft-msmarco.zh.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-de-dev \ --index miracl-v1.0-de-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.de.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-de-dev \ run.miracl.mcontriever-tied-pft-msmarco.de.dev.txt Command to generate run: python -m pyserini.search.faiss \ --threads 16 --batch-size 512 \ --encoder-class contriever \ --encoder facebook/mcontriever-msmarco \ --topics miracl-v1.0-yo-dev \ --index miracl-v1.0-yo-mcontriever-pft-msmarco \ --output run.miracl.mcontriever-tied-pft-msmarco.yo.dev.txt --hits 1000 Evaluation commands: python -m pyserini.eval.trec_eval \ -c -M 100 -m ndcg_cut.10 miracl-v1.0-yo-dev \ run.miracl.mcontriever-tied-pft-msmarco.yo.dev.txt |
Programmatic Execution
All experimental runs shown in the above table can be programmatically executed based on the instructions below. To list all the experimental conditions:
python -m pyserini.2cr.miracl --list-conditions
Run all languages for a specific condition and show commands:
python -m pyserini.2cr.miracl --condition bm25 --display-commands
Run a particular language for a specific condition and show commands:
python -m pyserini.2cr.miracl --condition bm25 --language ko --display-commands
Run all languages for all conditions and show commands:
python -m pyserini.2cr.miracl --all --display-commands
With the above commands, run files will be placed in the current directory. Use the option --directory runs to place the runs in a sub-directory.
For a specific condition, just show the commands and do not run:
python -m pyserini.2cr.miracl --condition bm25 --display-commands --dry-run
This will generate exactly the commands for a specific condition above (corresponding to a row in the table).
For a specific condition and language, just show the commands and do not run:
python -m pyserini.2cr.miracl --condition bm25 --language ko --display-commands --dry-run
For all conditions, just show the commands and do not run and skip evaluation:
python -m pyserini.2cr.miracl --all --display-commands --dry-run --skip-eval
Finally, to generate this page:
python -m pyserini.2cr.miracl --generate-report --output docs/2cr/miracl.html
The output file miracl.html should be identical to this page.