Paper page - Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for
Pairwise Sentence Scoring Tasks (original) (raw)
Published on Oct 16, 2020
Abstract
An augmented data strategy leveraging cross-encoders improves bi-encoders' performance on sentence scoring tasks with domain adaptation.
There are two approaches for pairwise sentence scoring: Cross-encoders, which perform full-attention over the input pair, and Bi-encoders, which map each input independently to a dense vector space. While cross-encoders often achieve higher performance, they are too slow for many practical use cases.Bi-encoders, on the other hand, require substantial training data and fine-tuning over the target task to achieve competitive performance. We present a simple yet efficient data augmentation strategy called Augmented SBERT, where we use the cross-encoder to label a larger set of input pairs to augment the training data for the bi-encoder. We show that, in this process, selecting thesentence pairs is non-trivial and crucial for the success of the method. We evaluate our approach on multiple tasks (in-domain) as well as on a domain adaptation task. Augmented SBERT achieves an improvement of up to 6 points for in-domain and of up to 37 points for domain adaptation tasks compared to the original bi-encoder performance.
View arXiv page View PDF Add to collection
Get this paper in your agent:
hf papers read 2010.08240
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash
Models citing this paper 9
dangvantuan/vietnamese-embedding Sentence Similarity • 0.1B • Updated Oct 23, 2025 • 243k • 50
Lajavaness/bilingual-embedding-large Sentence Similarity • Updated Aug 6, 2024 • 5.38k • 27
Lajavaness/bilingual-embedding-base Sentence Similarity • 0.3B • Updated Nov 18, 2024 • 3.01k • 9
Lajavaness/bilingual-document-embedding Sentence Similarity • 0.6B • Updated Dec 10, 2024 • 190 • 8
Browse 9 models citing this paper
Datasets citing this paper 0
No dataset linking this paper
Cite arxiv.org/abs/2010.08240 in a dataset README.md to link it from this page.
Spaces citing this paper 23
Collections including this paper 0
No Collection including this paper
Add this paper to a collection to link it from this page.