Paper page - Multilingual Universal Sentence Encoder for Semantic Retrieval (original) (raw)

Abstract

Multilingual sentence encoding models based on Transformer and CNN architectures outperform state-of-the-art on semantic retrieval, translation pair bitext retrieval, and retrieval question answering tasks.

We introduce two pre-trained retrieval focused multilingual sentence encodingmodels, respectively based on the Transformer and CNN model architectures. The models embed text from 16 languages into a single semantic space using amulti-task trained dual-encoder that learns tied representations usingtranslation based bridge tasks (Chidambaram al., 2018). The models provide performance that is competitive with the state-of-the-art on: semantic retrieval (SR), translation pair bitext retrieval (BR) and retrieval question answering (ReQA). On English transfer learning tasks, our sentence-level embeddings approach, and in some cases exceed, the performance of monolingual, English only, sentence embedding models. Our models are made available for download on TensorFlow Hub.

View arXiv page View PDF Add to collection

Get this paper in your agent:

hf papers read 1907.04307

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

osiria/distiluse-base-italian Feature Extraction • 66.6M • Updated Sep 29, 2023 • 1

Datasets citing this paper 9

Patt/RTE_TH Viewer • Updated Jan 11, 2025• 5.77k • 88

Patt/MultiRC_TH Viewer • Updated Jan 11, 2025• 41.8k • 46

Patt/RTE_TH_cleanned Viewer • Updated Jan 11, 2025• 5.72k • 18

Patt/copa_th Viewer • Updated Jan 11, 2025• 1k • 16

Browse 9 datasets citing this paper

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.