Nandan Thakur (original) (raw)
I focus on three core aspects focused around evaluation, data in information retrieval:
In my research, I’ve developed widely used retrieval benchmarks such as BEIR or MIRACL, and trained efficient retrieval models like GPL or SWIM-IR. This advances accelerate RAG systems–such as TREC-RAG–to produce better language model answers by (i) leveraging cleaner training data or generating synthetic data (e.g., RLHN, ORBIT) (ii) reducing hallucinations across domains and languages (e.g., NoMIRACL, MIRAGE-Bench), and (iii) enabling evaluation on realistic benchmarks and metrics (e.g., FreshStack).
Prior to my PhD, I was a NLP research assistant at the UKP Lab in TU Darmstadt advised by Prof. Iryna Gurevych and Nils Reimers. I have prior industry experience as a Data Scientist working at KNOLSKAPE. I completed my undergraduate from BITS Pilani KK Birla Goa Campus.
ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget
N. Thakur
, Z. Chen, X. Ma, J. Lin
Preprint 2026
Overview of the TREC 2025 Retrieval Augmented Generation (RAG) Track
S. Upadhyay,
N. Thakur
, R. Pradeep, N. Craswell, D. Campos, J. Lin
Preprint 2026
Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks
N. Kuissi, S. Subrahmanyan,
N. Thakur
, J. Lin
Preprint 2026
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent
Z. Chen, X. Ma, ...,
N. Thakur
, ... J. Lin
MTI-LLM @ NeurIPS 2025
Hard Negatives, Hard Lessons: Revisiting Training Data Quality for Robust Information Retrieval with LLMs
N. Thakur*
, C. Zhang*, X. Ma, J. Lin
EMNLP 2025 (Findings)
Chatbot Arena Meets Nuggets: Towards Explanations & Diagnostics in the Evaluation of LLM Responses
S. Sharifymoghaddam*, S. Upadhyay*,
N. Thakur*
, R. Pradeep, J. Lin
Preprint 2025
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents
N. Thakur
, J. Lin, S. Havens, M. Carbin, O. Khattab, A. Drozdov
NeurIPS 2025 (D&B)
Assessing Support for the TREC 2024 RAG Track: A Large-Scale Comparative Study of LLM and Human Evaluations
N. Thakur
, R. Pradeep, S. Upadhyay, D. Campos, N. Craswell, J. Lin
SIGIR 2025 (short)
The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
R. Pradeep,
N. Thakur
, S. Upadhyay, D. Campos, N. Craswell, J. Lin
SIGIR 2025
A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look
S. Upadhyay, R. Pradeep,
N. Thakur
, D. Campos, N. Craswell, I. Soboroff, H. T. Dang, J. Lin
ICTIR 2025
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
R. Pradeep*,
N. Thakur*
, S. Sharifymoghaddam, E. Zhang, R. Nguyen, D. Campos, N. Craswell, J. Lin
ECIR 2025 (Findings)
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
N. Thakur
, S. Kazi, G. Luo, J. Lin, A. Ahmad
NAACL 2025
MMTEB: Massive Multilingual Text Embedding Benchmark
K. Enevoldsen, I. Chung, …,
N. Thakur
, …
ICLR 2025
UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor
S. Upadhyay, R. Pradeep,
N. Thakur
, N. Craswell, J. Lin
Preprint 2024
“Knowing When You Don’t Know”: A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation
N. Thakur
, L. Bonifacio, X. Zhang, O. Ogundepo, E. Kamalloo, D. A. Hermelo, …, M. Rezagholizadeh, J. Lin
EMNLP 2024 (Findings)
Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR
N. Thakur
, L. Bonifacio, M. Fröbe, A. Bondarenko, E. Kamalloo, M. Potthast, M. Hagen, J. Lin
SIGIR 2024 (Repro)
Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses
E. Kamalloo,
N. Thakur
, C. Lassance, X. Ma, J. H. Yang, J. Lin
SIGIR 2024 (Resource)
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
N. Thakur
, J. Ni, G. H. Abrego, J. Wieting, J. Lin, D. Cer
NAACL 2024
HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
E. Kamalloo, A. Jafari, X. Zhang,
N. Thakur
, J. Lin
Preprint 2023
Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval
J. Lin, D. Alfonso-Hermelo, V. Jeronymo, E. Kamalloo, C. Lassance, …,
N. Thakur
, J. H. Yang, X. Zhang
Preprint 2023
MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages
X. Zhang*,
N. Thakur*
, O. Ogundepo, E. Kamalloo, D. A. Hermelo, …, M. Rezagholizadeh, J. Lin
TACL 2023
Evaluating Embedding APIs for Information Retrieval
E. Kamalloo, X. Zhang, O. Ogundepo,
N. Thakur
, D. A. Hermelo, M. Rezagholizadeh, J. Lin
ACL 2023 (Industry)
SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval
N. Thakur
, K. Wang, I. Gurevych, J. Lin
SIGIR 2023 (Resource)
Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval
N. Thakur
, N. Reimers, J. Lin
ReNeuIR 2023
GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
K. Wang,
N. Thakur
, N. Reimers, I. Gurevych
NAACL 2022
BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models
N. Thakur
, N. Reimers, A. Rücklé, A. Srivastava, I. Gurevych
NeurIPS 2021 (D&B)
Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks
N. Thakur
, N. Reimers, J. Daxenberger, I. Gurevych
NAACL 2021