Emnlp2013 (original) (raw)

Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data

Interpreting anaphoric shell nouns (ASNs) such as this issue and this fact is essential to understanding virtually any substantial natural language text. One obstacle in developing methods for automatically interpreting ASNs is the lack of annotated data. We tackle this challenge by exploiting cataphoric shell nouns (CSNs) whose construction makes them particularly easy to interpret (e.g., the fact that X). We propose an approach that uses automatically extracted antecedents of CSNs as training data to interpret ASNs. We achieve precisions in the range of 0.35 (baseline = 0.21) to 0.72 (baseline = 0.44), depending upon the shell noun.

Annotating anaphoric shell nouns with their antecedents

Anaphoric shell nouns such as this issue and this fact conceptually encapsulate complex pieces of information . We examine the feasibility of annotating such anaphoric nouns using crowdsourcing. In particular, we present our methodology for reliably annotating antecedents of such anaphoric nouns and the challenges we faced in doing so. We also evaluated the quality of crowd annotation using experts. The results suggest that most of the crowd annotations were good enough to use as training data for resolving such anaphoric nouns.

Towards the Automatic Resolution of Anaphora with Non-nominal Antecedents: Insights from Annotation

ISBN, 2018

This paper deals with a particular form of anaphora in which the anaphors refer to non-nominal antecedents. We investigate two existing datasets, annotated with pronominal and nominal anaphors (shell nouns) respectively, and attempt to determine to what degree the different types of anaphors provide useful hints as to the form and location of their antecedents. To this end, we look at the distribution of the antecedents, their syntactic form, and their semantic content. In particular, as the difficulty of annotating the phenomenon constitutes a major hurdle to the development of larger datasets, we take a close look at the agreement between annotators and relate this to the different types of anaphors.

ARRAU: Linguistically-Motivated Annotation of Anaphoric Descriptions

2016

This paper presents a second release of the ARRAU dataset: a multi-domain corpus with thorough linguistically motivated annotation of anaphora and related phenomena. Building upon the first release almost a decade ago, a considerable effort had been invested in improving the data both quantitatively and qualitatively. Thus, we have doubled the corpus size, expanded the selection of covered phenomena to include referentiality and genericity and designed and implemented a methodology for enforcing the consistency of the manual annotation. We believe that the new release of ARRAU provides a valuable material for ongoing research in complex cases of coreference as well as for a variety of related tasks. The corpus is publicly available through LDC.

Anaphora With Non-nominal Antecedents in Computational Linguistics: a Survey

Computational Linguistics, 2018

This article provides an extensive overview of the literature related to the phenomenon of non-nominal-antecedent anaphora (also known as abstract anaphora or discourse deixis), a type of anaphora in which an anaphor like “that” refers to an antecedent (marked in boldface) that is syntactically non-nominal, such as the first sentence in “It’s way too hot here. That’s why I’m moving to Alaska.” Annotating and automatically resolving these cases of anaphora is interesting in its own right because of the complexities involved in identifying non-nominal antecedents, which typically represent abstract objects such as events, facts, and propositions. There is also practical value in the resolution of non-nominal-antecedent anaphora, as this would help computational systems in machine translation, summarization, and question answering, as well as, conceivably, any other task dependent on some measure of text understanding. Most of the existing approaches to anaphora annotation and resoluti...

Formal, syntactic, semantic and textual features of English shell nouns: A manual corpus-driven approach

Diachrony and Synchrony in English Corpus Linguistics (Edited by Alejandro Alcaraz-Sintes & Salvador Valera-Hernandez), Linguistic Insights Series, 181, Peter Lang, 2014

This paper analyses a group of shell nouns from a small but representative sample of the English language, i.e., the BNC Sampler Corpus. This class of nouns comprises abstract units (e.g., lie, idea, issue) that help to condense long stretches of discourse into smaller discourse entities. The motivation for their use lies in their text-organising and evaluative functions. The automation and genre-specific nature of most research on these units provides only a partial account of their use in natural discourse. This article therefore offers a tentative profile of shell-noun behaviour based on a manual and context-sensitive analysis of nine variables covering all levels of linguistic analysis (i.e., formal, syntactic, semantico-pragmatic and textual). The evidence (42 shell nouns: 1,110 concordance lines) reveals certain similarities with the findings in (fully and semi-) automated analyses of the literature (e.g., genre: academic and journalistic; deictic: specific), but also various differences (e.g., syntactic pattern: noun complement clauses vs. prepositional phrases; rhetorical function: more anaphoric vs. more cataphoric).

Acquiring lexical knowledge for anaphora resolution

Proceedings of the 3rd …, 2002

The lack of adequate bases of commonsense or even lexical knowledge is perhaps the main obstacle to the development of highperformance, robust tools for semantic interpretation. It is also generally accepted that, notwithstanding the increasing availability in recent years of substantial hand-coded lexical resources such as WordNet and EuroWordNet, addressing the commonsense knowledge bottleneck will eventually require the development of effective techniques for acquiring such information automatically, e.g., from corpora. We discuss research aimed at improving the performance of anaphora resolution systems by acquiring the commonsense knowledge require to resolve the more complex cases of anaphora, such as bridging references. We focus in particular on the problem of acquiring information about part-of relations.

Use of Domain Knowledge in Resolving Pronominal Anaphora

Belgian Journal of Linguistics, 1996

The research reported here has been conducted in the context of the Plinius project, which aims at semi-automatic knowledge acquisition from short naturallanguage texts. In this framework, a system has been developed for finding the antecedents of pronominal anaphora, in particular 'it'-and 'its'-anaphora. The anaphora resolution module operates on parser output and can make use of information generated by the parser; the lexicon gives the conceptual representations corresponding to the words. The algorithm for anaphora resolution involves three steps: (i) Assemble: construct a list of discourse entities (DEs); (ii) Identify: identify anaphoric DEs; (iii) Select: select, for each anaphoric DE, another DE from the list of DEs as its antecedent. The third step applies four constraints, i.e. rules to which a DE must conform in order to be a valid candidate: (a) semantic type agreement; (b) number agreement; (c) projection constraint; (d) conceptual compatibility. Constraints (a, b, c) are linguistic, while (d) is domain-related. The algorithm has been tested on three texts. It turns out that applying (d) before (a, b, c) considerably improves efficiency.

Nominal anaphora. Can we tame the beasts?

2009

In this paper we present a filter-based approach to nominal anaphora resolution for German. GermaNet is used to define hard filters that pass only licensed candidate pairs to TiMBL, a memorybased learner. Through this restrictive filtering, 13.5% of the true nominal anaphora get lost thereby defining an upper bound for recall. In our experiments, we found that less restrictive filters, if at all, only slightly improve recall-too many negative pairs were generated leaving the machine learning classifier with the burden to find the few (positive) needles in the (mostly negative) haystack. Could corpus-based methods help to dig out the remaining true nominal anaphora? About [57% of the filtered out pairs] k are idiosyncratic, they form ['hapex legomena anaphors'] k and, thus, cannot be found by statistical approaches. We are talking thus about the remaining 43%, which is only 5.8% of all true nominal anaphora. Among these remaining cases are instances of locigal metonymy, as well as cases where a Wikipedia lookup (and some NLP) easily could do the job. Admittedly, [statistics] l might act as a last resort to catch [the rest]m. But this is nothing but the use of [a sledgehammer] l to crack [a nut]m 1 .

How far are we from (semi-) automatic annotation of anaphoric links in corpora?

Proceedings of a Workshop on Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts - ANARESOLUTION '97, 1997

The paper raises for discussion a proposal for the semi-automatic annotation of pronoun-antecedent pairs in corpora. The proposal is based on robust knowledge-poor pronoun resolution followed by post-editing. The paper is structured as follows. The introduction comments on the fact that automatic identification of referential links in corpora has lagged behind in comparison with similar lexical, syntactical and even semantic tasks. The second section of the paper outlines the author's practical and robust knowledge-based approach to pronoun resolution which will subsequently be put forward as the core of a larger architecture proposed for the automatic tagging of referential links. Section 3 briefly presents other related knowledge-poor approaches, while section 4 discusses the limitations and advantages of the practical approach. The main argument of the paper is to be found in section 5, where we present the idea of developing a semi-automatic environment for annotating anaphoric links and outline the components of such a program. Finally, the conclusion looks at the anticipated success rate of the approach.