Nandan Thakur (original) (raw)

I focus on three core aspects focused around evaluation, data in information retrieval:

In my research, I’ve developed widely used retrieval benchmarks such as BEIR or MIRACL, and trained efficient retrieval models like GPL or SWIM-IR. This advances accelerate RAG systems–such as TREC-RAG–to produce better language model answers by (i) leveraging cleaner training data or generating synthetic data (e.g., RLHN, ORBIT) (ii) reducing hallucinations across domains and languages (e.g., NoMIRACL, MIRAGE-Bench), and (iii) enabling evaluation on realistic benchmarks and metrics (e.g., FreshStack).

Prior to my PhD, I was a NLP research assistant at the UKP Lab in TU Darmstadt advised by Prof. Iryna Gurevych and Nils Reimers. I have prior industry experience as a Data Scientist working at KNOLSKAPE. I completed my undergraduate from BITS Pilani KK Birla Goa Campus.

ORBIT: Scalable and Verifiable Data Generation for Search Agents on a Tight Budget

N. Thakur

, Z. Chen, X. Ma, J. Lin
Preprint 2026

Overview of the TREC 2025 Retrieval Augmented Generation (RAG) Track
S. Upadhyay,

N. Thakur

, R. Pradeep, N. Craswell, D. Campos, J. Lin
Preprint 2026

Still Fresh? Evaluating Temporal Drift in Retrieval Benchmarks
N. Kuissi, S. Subrahmanyan,

N. Thakur

, J. Lin
Preprint 2026

BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent
Z. Chen, X. Ma, ...,

N. Thakur

, ... J. Lin
MTI-LLM @ NeurIPS 2025

Hard Negatives, Hard Lessons: Revisiting Training Data Quality for Robust Information Retrieval with LLMs

N. Thakur*

, C. Zhang*, X. Ma, J. Lin
EMNLP 2025 (Findings)

Chatbot Arena Meets Nuggets: Towards Explanations & Diagnostics in the Evaluation of LLM Responses
S. Sharifymoghaddam*, S. Upadhyay*,

N. Thakur*

, R. Pradeep, J. Lin
Preprint 2025

FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

N. Thakur

, J. Lin, S. Havens, M. Carbin, O. Khattab, A. Drozdov
NeurIPS 2025 (D&B)

Assessing Support for the TREC 2024 RAG Track: A Large-Scale Comparative Study of LLM and Human Evaluations

N. Thakur

, R. Pradeep, S. Upadhyay, D. Campos, N. Craswell, J. Lin
SIGIR 2025 (short)

The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models
R. Pradeep,

N. Thakur

, S. Upadhyay, D. Campos, N. Craswell, J. Lin
SIGIR 2025

A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look
S. Upadhyay, R. Pradeep,

N. Thakur

, D. Campos, N. Craswell, I. Soboroff, H. T. Dang, J. Lin
ICTIR 2025

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
R. Pradeep*,

N. Thakur*

, S. Sharifymoghaddam, E. Zhang, R. Nguyen, D. Campos, N. Craswell, J. Lin
ECIR 2025 (Findings)

MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems

N. Thakur

, S. Kazi, G. Luo, J. Lin, A. Ahmad
NAACL 2025

MMTEB: Massive Multilingual Text Embedding Benchmark
K. Enevoldsen, I. Chung, …,

N. Thakur

, …
ICLR 2025

UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor
S. Upadhyay, R. Pradeep,

N. Thakur

, N. Craswell, J. Lin
Preprint 2024

“Knowing When You Don’t Know”: A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation

N. Thakur

, L. Bonifacio, X. Zhang, O. Ogundepo, E. Kamalloo, D. A. Hermelo, …, M. Rezagholizadeh, J. Lin
EMNLP 2024 (Findings)

Systematic Evaluation of Neural Retrieval Models on the Touché 2020 Argument Retrieval Subset of BEIR

N. Thakur

, L. Bonifacio, M. Fröbe, A. Bondarenko, E. Kamalloo, M. Potthast, M. Hagen, J. Lin
SIGIR 2024 (Repro)

Resources for Brewing BEIR: Reproducible Reference Models and Statistical Analyses
E. Kamalloo,

N. Thakur

, C. Lassance, X. Ma, J. H. Yang, J. Lin
SIGIR 2024 (Resource)

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

N. Thakur

, J. Ni, G. H. Abrego, J. Wieting, J. Lin, D. Cer
NAACL 2024

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution
E. Kamalloo, A. Jafari, X. Zhang,

N. Thakur

, J. Lin
Preprint 2023

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval
J. Lin, D. Alfonso-Hermelo, V. Jeronymo, E. Kamalloo, C. Lassance, …,

N. Thakur

, J. H. Yang, X. Zhang
Preprint 2023

MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages
X. Zhang*,

N. Thakur*

, O. Ogundepo, E. Kamalloo, D. A. Hermelo, …, M. Rezagholizadeh, J. Lin
TACL 2023

Evaluating Embedding APIs for Information Retrieval
E. Kamalloo, X. Zhang, O. Ogundepo,

N. Thakur

, D. A. Hermelo, M. Rezagholizadeh, J. Lin
ACL 2023 (Industry)

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

N. Thakur

, K. Wang, I. Gurevych, J. Lin
SIGIR 2023 (Resource)

Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval

N. Thakur

, N. Reimers, J. Lin
ReNeuIR 2023

GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
K. Wang,

N. Thakur

, N. Reimers, I. Gurevych
NAACL 2022

BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models

N. Thakur

, N. Reimers, A. Rücklé, A. Srivastava, I. Gurevych
NeurIPS 2021 (D&B)

Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

N. Thakur

, N. Reimers, J. Daxenberger, I. Gurevych
NAACL 2021