ChID: A Large-scale Chinese IDiom Dataset for Cloze Test (original) (raw)

View PDF

Abstract:Cloze-style reading comprehension in Chinese is still limited due to the lack of various corpora. In this paper we propose a large-scale Chinese cloze test dataset ChID, which studies the comprehension of idiom, a unique language phenomenon in Chinese. In this corpus, the idioms in a passage are replaced by blank symbols and the correct answer needs to be chosen from well-designed candidate idioms. We carefully study how the design of candidate idioms and the representation of idioms affect the performance of state-of-the-art models. Results show that the machine accuracy is substantially worse than that of human, indicating a large space for further research.

Submission history

From: Chujie Zheng [view email]
[v1] Tue, 4 Jun 2019 08:29:00 UTC (40 KB)
[v2] Sat, 8 Jun 2019 05:48:04 UTC (40 KB)
[v3] Mon, 27 Jan 2020 16:43:59 UTC (40 KB)