Methodology and Results for the Competition on Semantic Similarity Evaluation and Entailment Recognition for PROPOR 2016 (original) (raw)

A machine learning approach for recognizing textual entailment in Spanish

Proceedings of the NAACL HLT 2010 Young …, 2010

This paper presents a system that uses machine learning algorithms for the task of recognizing textual entailment in Spanish language. The datasets used include SPARTE Corpus and a translated version to Spanish of RTE3, RTE4 and RTE5 datasets. The features chosen quantify lexical, syntactic and semantic level matching between text and hypothesis sentences. We analyze how the different sizes of datasets and classifiers could impact on the final overall performance of the RTE classification of two-way task in Spanish. The RTE system yields 60.83% of accuracy and a competitive result of 66.50% of accuracy is reported by train and test set taken from SPARTE Corpus with 70% split.

Benchmarking Natural Language Inference and Semantic Textual Similarity for Portuguese

Information, 2020

Two sentences can be related in many different ways. Distinct tasks in natural language processing aim to identify different semantic relations between sentences. We developed several models for natural language inference and semantic textual similarity for the Portuguese language. We took advantage of pre-trained models (BERT); additionally, we studied the roles of lexical features. We tested our models in several datasets—ASSIN, SICK-BR and ASSIN2—and the best results were usually achieved with ptBERT-Large, trained in a Brazilian corpus and tuned in the latter datasets. Besides obtaining state-of-the-art results, this is, to the best of our knowledge, the most all-inclusive study about natural language inference and semantic textual similarity for the Portuguese language.

BUAP: N-gram based Feature Evaluation for the Cross-Lingual Textual Entailment Task

This paper describes the evaluation of different kinds of textual features for the Cross-Lingual Textual Entailment Task of SemEval 2013. We have counted the number of Ngrams for three types of textual entities (character, word and PoS tags) that exist in the pair of sentences from which we are interested in determining the judgment of textual entailment. Difference, intersection and distance (Euclidian, Manhattan and Jaccard) of N-grams were considered for constructing a feature vector which is further introduced in a support vector machine classifier which allows to construct a classification model. Five different runs were submitted, one of them considering voting system of the previous four approaches. The results obtained show a performance below the median of six teams that have participated in the competition.

ASAPP 2.0: Advancing the state-of-the-art of semantic textual similarity for Portuguese

2018

Semantic Textual Similarity (STS) aims at computing the proximity of meaning transmitted by two sentences. In 2016, the ASSIN shared task targeted STS in Portuguese and released training and test collections. This paper describes the development of ASAPP, a system that participated in ASSIN, but has been improved since then, and now achieves the best results in this task. ASAPP learns a STS function from a broad range of lexical, syntactic, semantic and distributional features. This paper describes the features used in the current version of ASAPP, and how they are exploited in a regression algorithm to achieve the best published results for ASSIN to date, in both European and Brazilian Portuguese.

IKOMA at TAC2011: A Method for Recognizing Textual Entailment using Lexical-level and Sentence Structure-level features

2011

This paper describes the Recognizing Textual Entailment (RTE) system that our teams developed for TAC 2011. Our system combines the entailment score calculated by lexicallevel matching with the machine-learningbased filtering mechanism using various features obtained from lexical-level, chunk-level and predicate argument structure-level information. In the filtering mechanism, we try to discard the T-H pairs that have high entailment score and are actually not entailment. That is, for filtering false positive T-H pairs caused by our lexical-level manner, we use additional information like features from word chunks and predicate-argument structures.

Experiments of UNED at the third recognising textual entailment challenge

Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing - RTE '07, 2007

Recognizing and generating textual entailment and paraphrases are regarded as important technologies in a broad range of NLP applications, including, information extraction, summarization, question answering, information retrieval, machine translation and text generation. Both textual entailment and paraphrasing address relevant aspects of natural language semantics. Entailment is a directional relation between two expressions in which one of them implies the other, whereas paraphrase is a relation in which two expressions convey essentially the same meaning. Indeed, paraphrase can be defined as bi-directional entailment. While it may be debatable how such semantic definitions can be made well-founded, in practice we have already seen evidence that such knowledge is essential for many applications.

Textual Entailment Using Lexical And Syntactic Similarity

International Journal, 2011

A two-way Textual Entailment (TE) recognition system that uses lexical and syntactic features has been described in this paper. The TE system is rule based that uses lexical and syntactic similarities. The important lexical similarity features that are used in the present system are: WordNet based uni-gram match, bi-gram match, longest common sub-sequence, skip-gram, stemming. In the syntactic TE system, the important features used are: subject-subject comparison, subject-verb comparison, object-verb comparison and cross subject-verb comparison. The system has been separately trained on each development corpus released as part of the Recognising Textual Entailment (RTE) competitions RTE-1, and tested on the respective RTE test sets. No separate development data was released in RTE-4. The evaluation results on each test set are compared with the RTE systems that participated in the respective RTE competitions with lexical and syntactic approaches.

Semi-Automatic Construction of a Textual Entailment Dataset: Selecting Candidates with Vector Space Models

2015

Recognizing Textual Entailment (RTE) is an NLP task aimed at detecting whether the meaning of a given piece of text entails the meaning of another one. Despite its relevance to many NLP areas, it has been scarcely explored in Portuguese, mainly due to the lack of labeled data. A dataset for RTE must contain both positive and negative examples of entailment, and neither should be obvious: negative examples shouldn’t be completely unrelated texts and positive examples shouldn’t be too similar. We report here an ongoing work to address this difficulty using Vector Space Models (VSMs) to select candidate pairs from news clusters. We compare three different VSMs, and show that Latent Dirichlet Allocation achieves promising results, yielding both good positive and negative examples.

A Language Independent Approach for Recognizing Textual Entailment

Textual Entailment Recognition (RTE) was proposed as a generic task, aimed at building modules capable of capturing the semantic variability of texts and performing natural language inferences. These modules can be then included in any NLP system, improving its performance in fine-grained semantic differentiation. The first part of the article describes our approach aimed at building a generic, language-independent TE system that would eventually be used as a module within a QA system. We evaluated the accuracy of this system by building two instances of it -for English and Romanian and testing them on the data from the RTE3 competition. In the second part we show how we applied the steps described in [1] and adapted this system in order to include it as module in a QA system architecture. Lastly, we show the results obtained, which point out significant growth in precision.

Fourth Recognising Textual Entailment

2012

This paper describes our experiments on Textual Entailment in the context of the Fourth Recognising Textual Entailment (RTE-4) Evaluation Challenge at TAC 2008 contest. Our system uses a Machine Learning approach with AdaBoost to deal with the RTE challenge. We perform a lexical, syntactic, and semantic analysis of the entailment pairs. From this information we compute a set of semantic-based distances between sentences. We improved our baseline system for the RTE-3 challenge with more Language Processing techniques, an hypothesis classifier, and new semantic features. The results show no general improvement with respect to the baseline.