Italian Content Annotation Bank (I-CAB): Named Entities (original) (raw)
Related papers
I-CAB: the italian content annotation bank
Proceedings of …, 2006
In this paper we present work in progress for the creation of the Italian Content Annotation Bank (I-CAB), a corpus of Italian news annotated with semantic information at different levels. The first level is represented by temporal expressions, the second level is represented by different types of entities (i.e. person, organizations, locations and geo-political entities), and the third level is represented by relations between entities (e.g. the affiliation relation connecting a person to an organization). So far I-CAB has been manually annotated with temporal expressions, person entities and organization entities. As we intend I-CAB to become a benchmark for various automatic Information Extraction tasks, we followed a policy of reusing already available markup languages. In particular, we adopted the annotation schemes developed for the ACE Entity Detection and Time Expressions Recognition and Normalization tasks. As the ACE guidelines have originally been developed for English, part of the effort consisted in adapting them to the specific morpho-syntactic features of Italian. Finally, we have extended them to include a wider range of entities, such as conjunctions.
Italian Content Annotation Bank (I-CAB): Temporal Expressions
2007
This document reports on Temporal Expression annotation for the Italian Content Annotation Bank (I-CAB) being developed at ITC-irst in conjunction with CELCT. We describe the extensions to the English annotation guidelines that are required for Italian and provide a large number of examples and a detailed description of the benchmark.
Italian Content Annotation Bank (I-CAB): Temporal Expressions (V. 1.0.)
2005
ABSTRACT This document reports on Temporal Expression annotation for the Italian Content Annotation Bank (I-CAB) being developed at ITC-irst. We describe the extensions to the English annotation guidelines that are required for Italian and provide a large number of examples and a detailed description of the benchmark.
The lexico-semantic annotation of an Italian Treebank
2002
Corpora annotated at semantic level play a crucial role both in research and in applicative contexts in which systems of natural language processing are studied and developed. In this paper we present the lexico-semantic annotation of an Italian treebank, a first attempt to recover the lack of such resource for Italian. We will describe the annotation realized, focusing on the methodology followed, the results achieved, and possible further work and applications.
Creating a gold standard for person crossdocument coreference resolution in italian news
2008
Abstract This paper presents work aimed at the realization of a gold standard for cross-document coreference resolution of person entities in a corpus of Italian news. The gold standard has been created selecting a number of person names occurring in Adige-500K, a corpus composed of all the news stories published by the local newspaper" L'Adige" from 1999 to 2006. The corpus consists of 535,000 news stories, for a total of around 200 million tokens.
Overview of the EVALITA 2016 Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) Task
EVALITA. Evaluation of NLP and Speech Tools for Italian
English. This report describes the main outcomes of the 2016 Named Entity rEcognition and Linking in Italian Tweet (NEEL-IT) Challenge. The goal of the challenge is to provide a benchmark corpus for the evaluation of entity recognition and linking algorithms specifically designed for noisy and short texts, like tweets, written in Italian. The task requires the correct identification of entity mentions in a text and their linking to the proper named entities in a knowledge base. To this aim, we choose to use the canonicalized dataset of DBpedia 2015-10. The task has attracted five participants, for a total of 15 runs submitted. Italiano. In questo report descriviamo i principali risultati conseguiti nel primo task per la lingua Italiana di Named Entity rEcognition e Linking in Tweet (NEEL-IT). Il task si prefigge l'obiettivo di offrire un framework di valutazione per gli algoritmi di riconoscimento e linking di entità a nome proprio specificamente disegnati per la lingua italiana per testi corti e rumorosi, quali i tweet. Il task si compone di una fase di riconoscimento delle menzioni di entità con nome proprio nel testo e del loro successivo collegamento alle opportune entità in una base di conoscenza. In questo task abbiamo scelto come base di conoscenza la versione canonica di DBpedia 2015. Il task ha attirato cinque partecipanti per un totale di 15 diversi run.