Paper page - The Curse of Dense Low-Dimensional Information Retrieval for Large Index

Sizes (original) (raw)

Published on Dec 28, 2020

Abstract

Theoretical and empirical analysis demonstrates that dense representations perform worse than sparse representations as index size increases, due to higher false positives in lower dimensions.

Information Retrieval using dense low-dimensional representations recently became popular and showed out-performance to traditional sparse-representationslike BM25. However, no previous work investigated how dense representationsperform with large index sizes. We show theoretically and empirically that the performance for dense representations decreases quicker than sparse representations for increasing index sizes. In extreme cases, this can even lead to a tipping point where at a certain index size sparse representations outperform dense representations. We show that this behavior is tightly connected to the number of dimensions of the representations: The lower the dimension, the higher the chance for false positives, i.e. returning irrelevant documents.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 2012.14210

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

lengocduc195/SentenceTransformer Updated Jun 18, 2023

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2012.14210 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2012.14210 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.