SUTime: A library for recognizing and normalizing time expressions (original) (raw)

The DANTE temporal expression tagger

2009

In this paper we present the DANTE system, a tagger for temporal expressions in English documents. DANTE performs both recognition and normalization of these expressions in accordance with the TIMEX2 annotation standard. The system is built on modular principles, with a clear separation between the recognition and normalisation components. The interface between these components is based on our novel approach to representing the local semantics of temporal expressions. DANTE has been developed in two phases: first on the basis of the TIMEX2 guidelines only, and then on the ACE 2005 development data. The system has been evaluated on the ACE 2005 and ACE 2007 data. Although this is still work in progress, we already achieve highly satisfactory results, both for the recognition of temporal expressions and their interpretation (normalisation).

A Comparison of Statistical and Rule-Induction Learners for Automatic Tagging of Time Expressions in English

2007

Proper recognition and handling of temporal information contained in a text is key to understanding the flow of events depicted in the text and their accompanying circumstances. Consequently, time expression recognition and representation of the time information they convey in a suitable normalized form is an important task relevant to several problems in Natural Language Processing. In particular, such an analysis is largely significant for Information Extraction (IE), Question Answering (QA) and Automatic Summarization (AS). The most common approach to time expression recognition in the past has been the use of handmade extraction rules (grammars), which also served as the basis for normalization. Our aim is to explore the possibilities afforded by applying machine learning techniques to the recognition of time expressions. We focus on recognizing the appearances of time expressions in text (not normalization) and transform the problem into one of chunking, where the aim is to correctly assign Begin, Inside or Outside (BIO) tags to tokens. In this paper, we explain the knowledge representation used and compare the results obtained in our experiments with two different methods, one statistical (support vector machines) and one of rule induction (FOIL). Our empirical analysis shows that SVMs are superior.

The TempEval challenge: identifying temporal relations in text

Language Resources and Evaluation, 2009

TempEval is a framework for evaluating systems that automatically annotate texts with temporal relations. It was created in the context of the SemEval 2007 workshop and uses the TimeML annotation language. The evaluation consists of three subtasks of temporal annotation: anchoring an event to a time expression in the same sentence, anchoring an event to the document creation time, and ordering main events in consecutive sentences. In this paper we describe the TempEval task and the systems that participated in the evaluation. In addition, we describe how further task decomposition can bring even more structure to the evaluation of temporal relations.

TimeML: Robust Specification of Event and Temporal Expressions in Text

2003

In this paper we provide a description of TimeML, a rich specification language for event and temporal expressions in natural language text, developed in the context of the AQUAINT program on Question Answering Systems. Unlike most previous work on event annotation, TimeML captures three distinct phenomena in temporal markup: (1) it systematically anchors event predicates to a broad range of temporally denotating expressions; (2) it orders event expressions in text relative to one another, both intrasententially and in discourse; and (3) it allows for a delayed (underspecified) interpretation of partially determined temporal expressions. We demonstrate the expressiveness of TimeML for a broad range of syntactic and semantic contexts, including aspectual predication, modal subordination, and an initial treatment of lexical and constructional causation in text.

Recognising and Interpreting Named Temporal Expressions

Proceedings of Recent Advances in Natural Language Processing (RANLP 2013), 2013

This paper introduces a new class of temporal expression -named temporal expressions -and methods for recognising and interpreting its members. The commonest temporal expressions typically contain date and time words, like April or hours. Research into recognising and interpreting these typical expressions is mature in many languages. However, there is a class of expressions that are less typical, very varied, and difficult to automatically interpret. These indicate dates and times, but are harder to detect because they often do not contain time words and are not used frequently enough to appear in conventional temporally-annotated corporafor example Michaelmas or Vasant Panchami.

Recognition and Normalization of Time Expressions: ITC-irst at TERN 2004

This paper presents the Chronos system developed at ITC-irst to participate in the English Full Task organized within the TERN 2004 evaluation. Chronos extends the capabilities of a rule-based multilingual (English/Italian) Named Entity Recognition System, allowing for the recognition and normalization of temporal expressions within an input text. To this aim, the system is designed to provide the automatic annotation of textual data with the TIMEX2 tag, which includes attributes for expressing the normalized, intended meaning or value of a broad range of temporal expressions.

SUTime: Evaluation in TempEval-3

2013

We analyze the performance of SUTIME, a temporal tagger for recognizing and normalizing temporal expressions, on TempEval-3 Task A for English. SUTIME is available as part of the Stanford CoreNLP pipeline and can be used to annotate documents with temporal information. Testing on the TempEval-3 evaluation corpus showed that this system is competitive with state-of-the-art techniques.

Enriching TimeBank: Towards a More Precise Annotation of Temporal Relations in a Text (2016)

Proceedings of LREC 10, 2016

We propose a way of enriching the TimeML annotations of TimeBank by adding information about the Topic Time in terms of Klein (1994). The annotations are partly automatic, partly inferential and partly manual. The corpus was converted into the native format of the annotation software GraphAnno and POS-tagged using the Stanford bidirectional dependency network tagger. On top of each finite verb, a FIN-node with tense information was created, and on top of any FIN-node, a TOPICTIME-node, in accordance with Klein's (1994) treatment of finiteness as the linguistic correlate of the Topic Time. Each TOPICTIME-node is linked to a MAKEINSTANCE-node representing an (instantiated) event in TimeML (Pustejovsky et al., 2005), the markup language used for the annotation of TimeBank. For such links we introduce a new category, ELINK. ELINKs capture the relationship between the Topic Time (TT) and the Time of Situation (TSit) and have an aspectual interpretation in Klein's (1994) theory. In addition to these automatic and inferential annotations, some TLINKs were added manually. Using an example from the corpus, we show that the inclusion of the Topic Time in the annotations allows for a richer representation of the temporal structure than does TimeML. A way of representing this structure in a diagrammatic form similar to the T-Box format (Verhagen, 2007) is proposed.

Multilingual Extension of a Temporal Expression Normalizer using annotated corpora

2006

This paper presents the automatic extension to other languages of TERSEO, a knowledge-based system for the recognition and normalization of temporal expressions originally developed for Spanish 1. TERSEO was first extended to English through the automatic translation of the temporal expressions. Then, an improved porting process was applied to Italian, where the automatic translation of the temporal expressions from English and from Spanish was combined with the extraction of new expressions from an Italian annotated corpus. Experimental results demonstrate how, while still adhering to the rule-based paradigm, the development of automatic rule translation procedures allowed us to minimize the effort required for porting to new languages. Relying on such procedures, and without any manual effort or previous knowledge of the target language, TERSEO recognizes and normalizes temporal expressions in Italian with good results (72% precision and 83% recall for recognition).

Wikiwars: A new corpus for research on temporal expressions

2010

The reliable extraction of knowledge from text requires an appropriate treatment of the time at which reported events take place. Unfortunately, there are very few annotated data sets that support the development of techniques for event time-stamping and tracking the progression of time through a narrative. In this paper, we present a new corpus of temporally-rich documents sourced from English Wikipedia, which we have annotated with TIMEX2 tags. The corpus contains around 120000 tokens, and 2600 TIMEX2 expressions, thus comparing favourably in size to other existing corpora used in these areas. We describe the preparation of the corpus, and compare the profile of the data with other existing temporally annotated corpora. We also report the results obtained when we use DANTE, our temporal expression tagger, to process this corpus, and point to where further work is required. The corpus is publicly available for research purposes.