Learning to Select from Multiple Options (original) (raw)

A large annotated corpus for learning natural language inference

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

Understanding entailment and contradiction is fundamental to understanding natural language, and inference about entailment and contradiction is a valuable testing ground for the development of semantic representations. However, machine learning research in this area has been dramatically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by humans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude larger than all other resources of its type. This increase in scale allows lexicalized classifiers to outperform some sophisticated existing entailment models, and it allows a neural network-based model to perform competitively on natural language inference benchmarks for the first time.

Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts

Proceedings of the AAAI Conference on Artificial Intelligence

Natural language inference (NLI) is a fundamental NLP task, investigating the entailment relationship between two texts. Popular NLI datasets present the task at sentence-level. While adequate for testing semantic representations, they fall short for testing contextual reasoning over long texts, which is a natural part of the human inference process. We introduce ConTRoL, a new dataset for ConTextual Reasoning over Long texts. Consisting of 8,325 expert-designed "context-hypothesis" pairs with gold labels, ConTRoL is a passage-level NLI dataset with a focus on complex contextual reasoning types such as logical reasoning. It is derived from competitive selection and recruitment test (verbal reasoning test) for police recruitment, with expert level quality. Compared with previous NLI benchmarks, the materials in ConTRoL are much more challenging, involving a range of reasoning types. Empirical results show that state-of-the-art language models perform by far worse than educ...

An Entailment-Based Approach to the QA4MRE Challenge

2012

This paper describes our entry to the 2012 QA4MRE Main Task (English dataset). The QA4MRE task poses a significant challenge as the expression of knowledge in the question and answer (in the document) typically substantially differs. Ultimately, one would need a system that can perform full machine reading-creating an internal model of the document's meaningto achieve high performance. Our approach is a preliminary step toward this, based on estimating the likelihood of textual entailment between sentences in the text, and the question Q and each candidate answer A i. We first treat the question Q and each answer A i independently, and find sets of sentences SQ, SA that each plausibly entail (the target of) Q or one of the A i respectively. We then search for the closest (in the document) pair of sentences <S Q SQ, S Ai SA> in these sets, and conclude that the answer A i entailed by S Ai in the closest pair is the answer. This approach assumes coherent discourse, i.e., that sentences close together are usually "talking about the same thing", and thus conveying a single idea (namely an expression of the Q+A i pair). In QA4MRE it is hard to "prove" entailment, as a candidate answer A may be expressed using a substantially different wording in the document, over multiple sentences, and only partially (as some aspects of A may be left implicit in the document, to be filled in by the reader). As a result, we instead estimate the likelihood of entailment (that a sentence S entails A) by look for evidence, namely entailment relationships between components of S and A such as words, bigrams, trigrams, and parse fragments. To identify these possible entailment relationships we use three knowledge resources, namely WordNet, ParaPara (a large paraphrase database from Johns Hopkins University), and the DIRT paraphrase database. Our best run scored 40% in the evaluation, and around 42% in additional (unsubmitted) runs afterwards. In ablation studies, we found that the majority of our score (approximately 38%) could be attributed to the basic algorithm, with the knowledge resources adding approximately 4% to this baseline score. Finally we critique our approach with respect to the broader goal of machine reading, and discuss what is needed to move closer to that goal.

Recognizing Textual Entailment via Multi-task Knowledge Assisted LSTM

Lecture Notes in Computer Science, 2016

Recognizing Textual Entailment (RTE) plays an important role in NLP applications like question answering, information retrieval, etc. Most previous works either use classifiers to employ elaborately designed features and lexical similarity or bring distant supervision and reasoning technique into RTE task. However, these approaches are hard to generalize due to the complexity of feature engineering and are prone to cascading errors and data sparsity problems. For alleviating the above problems, some work use LSTM-based recurrent neural network with word-byword attention to recognize textual entailment. Nevertheless, these work did not make full use of knowledge base (KB) to help reasoning. In this paper, we propose a deep neural network architecture called Multi-task Knowledge Assisted LSTM (MKAL), which aims to conduct implicit inference with the assistant of KB and use predicate-topredicate attention to detect the entailment between predicates. In addition, our model applies a multi-task architecture to further improve the performance. The experimental results show that our proposed method achieves a competitive result compared to the previous work.

Recognizing textual entailment with deep-shallow semantic analysis and logical inference

In this paper, the architecture and evaluation of a new system for recognizing textual entailment (RTE) is presented. It is conceived as an adaptable and modular environ-ment allowing for a high-coverage syntactic and semantic text analysis combined with logical inference. For the syntactic and semantic analysis it combines an HPSG-based deep semantic analysis with a shallow one supported by statistical models in order to increase the quality and accuracy of results. For recognizing textual entailment we use logical inference of first-order employing model-theoretic techniques and automated reasoning tools. The inference is supported with problem-relevant background knowledge extracted automatically and on demand from external sources like, e.g., WordNet, YAGO, and OpenCyc, or other, experimental sources with, e.g., manually defined presupposition resolutions, or with general and com-mon sense knowledge. The system comes with a graphical user interface for control and presentation p...

A Knowledge-Based Textual Entailment Approach Applied to the AVE Task

Lecture Notes in Computer Science, 2007

The Answer Validation Exercise (AVE) is a pilot track within the Cross-Language Evaluation Forum (CLEF) 2006. The AVE competition provides an evaluation framework for answer validations in Question Answering (QA). In our participation in AVE, we propose a system that has been initially used for other task as Recognising Textual Entailment (RTE). The aim of our participation is to evaluate the improvement our system brings to QA. Moreover, due to the fact that these two task (AVE and RTE) have the same main idea, which is to find semantic implications between two fragments of text, our system has been able to be directly applied to the AVE competition. Our system is based on the representation of the texts by means of logic forms and the computation of semantic comparison between them. This comparison is carried out using two different approaches. The first one managed by a deeper study of the Word-Net relations, and the second uses the measure defined by Lin in order to compute the semantic similarity between the logic form predicates. Moreover, we have also designed a voting strategy between our system and the MLEnt system, also presented by the University of Alicante, with the aim of obtaining a joint execution of the two systems developed at the University of Alicante. Although the results obtained have not been very high, we consider that they are quite promising and this supports the fact that there is still a lot of work on researching in any kind of textual entailment.

A survey on Recognizing Textual Entailment as an NLP Evaluation

Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, 2020

Recognizing Textual Entailment (RTE) was proposed as a unified evaluation framework to compare semantic understanding of different NLP systems. In this survey paper, we provide an overview of different approaches for evaluating and understanding the reasoning capabilities of NLP systems. We then focus our discussion on RTE by highlighting prominent RTE datasets as well as advances in RTE dataset that focus on specific linguistic phenomena that can be used to evaluate NLP systems on a fine-grained level. We conclude by arguing that when evaluating NLP systems, the community should utilize newly introduced RTE datasets that focus on specific linguistic phenomena.

Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets

2021

Transformers represent the state-of-the-art in Natural Language Processing (NLP) in recent years, proving effective even in tasks done in low-resource languages. While pretrained transformers for these languages can be made, it is challenging to measure their true performance and capacity due to the lack of hard benchmark datasets, as well as the difficulty and cost of producing them. In this paper, we present three contributions: First, we propose a methodology for automatically producing Natural Language Inference (NLI) benchmark datasets for low-resource languages using published news articles. Through this, we create and release NewsPH-NLI, the first sentence entailment benchmark dataset in the low-resource Filipino language. Second, we produce new pretrained transformers based on the ELECTRA technique to further alleviate the resource scarcity in Filipino, benchmarking them on our dataset against other commonly-used transfer learning techniques. Lastly, we perform analyses on t...

Learning to Select from Multiple Options (original) (raw)

Related papers