MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset (original) (raw)

View PDF

Abstract:We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE). The dataset contains eleven language pairs, with human labels for up to 10,000 translations per language pair in the following formats: sentence-level direct assessments and post-editing effort, and word-level good/bad labels. It also contains the post-edited sentences, as well as titles of the articles where the sentences were extracted from, and the neural MT models used to translate the text.

Submission history

From: Marina Fomicheva [view email]
[v1] Fri, 9 Oct 2020 10:12:02 UTC (259 KB)
[v2] Thu, 16 Sep 2021 07:35:16 UTC (531 KB)
[v3] Mon, 11 Oct 2021 09:31:35 UTC (531 KB)