Overview of the EVALITA 2009 Part-of-Speech Tagging Task (original) (raw)

EVALITA 2007 THE ITALIAN PART-OF-SPEECH TAGGING EVALUATION - TASK GUIDELINES

2007

PoS-tagging task involves two different tagsets, used to classify the DS data and to be used to annotate TS data. We believe that the structure and the principles underlying the tagset design are crucial, both for a coherent approach to lexical classification and to obtain better performance results with automatic techniques, and deserve a further discussion.

Introducing a Gold Standard Corpus from Young Multilinguals for the Evaluation of Automatic UD-PoS Taggers for Italian

2021

Part-of-speech (PoS) tagging constitutes a common task in Natural Language Processing (NLP) given its widespread applicability. However, with the advance of new information technologies and language variation, the contents and methods for PoS-tagging have changed. The majority of Italian existing data for this task originate from standard texts, where language use is far from multifaceted informal real-life situations. Automatic PoS-tagging models trained with such data do not perform reliably on non-standard language, like social media content or language learners’ texts. Our aim is to provide additional training and evaluation data from language learners tagged in Universal Dependencies (UD), as well as testing current automatic PoStagging systems and evaluating their performance on such data. We use Italian texts from a multilingual corpus of young language learners, LEONIDE, to create a tagged gold standard for evaluating UD PoS-tagging performance on nonstandard language. With ...

A multistage PoS-tagger at the EVALITA 2009 PoS-tagging Task

Abstract. This paper presents an experimental system architecture for Part-Of-Speech Tagging for the Italian language, able to manage a large tagset to provide both lexical and morphological information. The tagger was built as a cascade of four classifiers where each classifier in the cascade accepts data from an initial input or the guesses of the previous one, executes its annotation, and sends the resulting data to the next stage, or to the output of the cascade.

Evalita 2007: Valutazione DI Sistemi Per L'Annotazione Delle Parti Del Discorso Evalita 2007: The Part-Of-Speech Tagging Task

corpora.dslo.unibo.it

Questo contributo descrive il task relativo al PoS-tagging in EVALITA 2007, un'iniziativa per la valutazione di sistemi per l'annotazione automatica delle parti del discorso per la lingua italiana. Un numero rilevante di studiosi ha partecipato alla valutazione, sperimentando i vari sistemi sui dati forniti dagli organizzatori. I risultati sono molto interessanti e le prestazioni raggiunte da tali sistemi sono molto alte, specialmente se confrontate con quelle ottenute allo stato dell'arte relativamente alla lingua inglese. This paper reports on EVALITA 2007 PoS-tagging task, an initiative for the evaluation of automatic PoS-taggers for Italian. A noticeable number of scholars and teams across Europe participated experimenting their systems on the data provided by the task organisers. The results are very interesting and overall performances are very high, when compared with tagging accuracy for other more studied languages. In particular, the best scores are very close to the state-of-the-art performances obtained for English.

UniBA @ KIPoS: A Hybrid Approach for Part-of-Speech Tagging (short paper)

2020

English. The Part of Speech tagging operation is becoming increasingly important as it represents the starting point for other high-level operations such as Speech Recognition, Machine Translation, Parsing and Information Retrieval. Although the accuracy of state-of-the-art POS-taggers reach a high level of accuracy (around 96-97%) it cannot yet be considered a solved problem because there are many variables to take into account. For example, most of these systems use lexical knowledge to assign a tag to unknown words. The task solution proposed in this work is based on a hybrid tagger, which doesn’t use any prior lexical knowledge, consisting of two different types of POS-taggers used sequentially: HMM tagger and RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. We trained the hybrid model using the Development set and the combination of Development and Silver sets. The results have shown an accuracy of 0,8114 and 0,8100 respectively for the main task. Italiano. L’opera...

PoS-tagging Italian texts with CORISTagger

This paper presents an evolution of CORISTagger [1], an high-perfor-mance PoS-tagger for Italian developed at the University of Bologna. The sys-tem is composed of a second-order Hidden Markov Model tagger followed by a Transformation Based tagger. The use of such a stacked structure, paired with a powerful morphological analyser based on a large lexicon composed of 120,000 lemmas, allowed the tagger to obtain good performances in the EVALITA 2009 PoS-tagging task. The performances of the tagger and the most common classifi-cation errors are discussed in detail.

Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource

2018

Most work on part-of-speech (POS) tagging is focused on high resource languages, or examines low-resource and active learning settings through simulated studies. We evaluate POS tagging techniques on an actual endangered language, Griko. We present a resource that contains 114 narratives in Griko, along with sentence-level translations in Italian, and provides gold annotations for the test set. Based on a previously collected small corpus, we investigate several traditional methods, as well as methods that take advantage of monolingual data or project cross-lingual POS tags. We show that the combination of a semi-supervised method with cross-lingual transfer is more appropriate for this extremely challenging setting, with the best tagger achieving an accuracy of 72.9%. With an applied active learning scheme, which we use to collect sentence-level annotations over the test set, we achieve improvements of more than 21 percentage points.

Using a Morphological Database to Increase the Accuracy in POS tagging

We experiment with extending the dictionaries used by three open-source part-of-speech taggers, by using data from a large Icelandic morphological database. We show that the accuracy of the taggers can be improved significantly by using the database. The reason is that the unknown word ratio reduces dramatically when adding data from the database to the taggers’ dictionaries. For the best performing tagger, the overall tagging accuracy increases from the base tagging result of 92.73% to 93.32%, when the unknown word ratio decreases from 6.8% to 1.1%. When we add reliable frequency information to the tag profiles for some of the words originating from the database, we are able to increase the accuracy further to 93.48% - this is equivalent to 10.3% error reduction compared to the base tagger.

Using a morphological analyzer in high precision POS tagging of Hungarian

Proceedings of LREC, 2006

The paper presents an evaluation of maxent POS disambiguation systems that incorporate an open source morphological analyzer to constrain the probabilistic models. The experiments show that the best proposed architecture, which is the first application of the maximum entropy framework in a Hungarian NLP task, outperforms comparable state of the art tagging methods and is able to handle out of vocabulary items robustly, allowing for efficient analysis of large (web-based) corpora.