Paper page - GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of
Dense Retrieval (original) (raw)
Published on Dec 14, 2021
Abstract
Generative Pseudo Labeling enhances dense retrieval across domains with limited data, improving performance and robustness over prior methods.
Dense retrieval approaches can overcome the lexical gap and lead to significantly improved search results. However, they require large amounts of training data which is not available for most domains. As shown in previous work (Thakur et al., 2021b), the performance of dense retrievers severely degrades under a domain shift. This limits the usage of dense retrievalapproaches to only a few domains with large training datasets. In this paper, we propose the novel unsupervised domain adaptation methodGenerative Pseudo Labeling (GPL), which combines a query generator with pseudo labeling from a cross-encoder. On six representative domain-specialized datasets, we find the proposed GPL can outperform an out-of-the-box state-of-the-art dense retrieval approach by up to 9.3 points nDCG@10. GPL requires less (unlabeled) data from the target domain and is more robust in its training than previous methods. We further investigate the role of six recent pre-training methods in the scenario of domain adaptation for retrieval tasks, where only three could yield improved results. The best approach, TSDAE (Wang et al., 2021) can be combined with GPL, yielding another average improvement of 1.4 points nDCG@10 across the six tasks. The code and the models are available at https://github.com/UKPLab/gpl.
View arXiv page View PDF GitHub 341 auto Add to collection
Get this paper in your agent:
hf papers read 2112.07577
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash
Models citing this paper 19
doc2query/msmarco-14langs-mt5-base-v1 Updated May 2, 2022 • 11 • 14
doc2query/msmarco-chinese-mt5-base-v1 Updated Apr 29, 2022 • 4 • 14
doc2query/msmarco-portuguese-mt5-base-v1 Updated Apr 29, 2022 • 6 • 10
doc2query/msmarco-russian-mt5-base-v1 Updated Apr 29, 2022 • 7 • 8
Browse 19 models citing this paper
Datasets citing this paper 0
No dataset linking this paper
Cite arxiv.org/abs/2112.07577 in a dataset README.md to link it from this page.