Giovanni Izzi - Academia.edu (original) (raw)

Papers by Giovanni Izzi

Research paper thumbnail of EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Sp... more Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it). This volume includes the reports of both task organisers and participants to all of the EVALITA 2020 challenges. In the 2020 edition, we coordinated the organization of 14 different tasks belonging to five research areas, being: (i) Affect, Hate, and Stance, (ii) Creativity and Style, (iii) New Challenges in Long-standing Tasks, (iv) Semantics and Multimodality, Time and Diachrony. The volume is opened by an overview to the EVALITA 2020 campaign, in which we describe the tasks, provide statistics on the participants and task organizers as well as our supporting sponsors. The abstract of the keynote speech made by Preslav Nakov titled "Flattening the Curve of the COVID-19 Infodemic: These Evaluation Campaigns Can Help!" is also included in this collection. Due to the 2020 COVID-19 pandemic, the traditional workshop was held online, where several members of the Italian NLP Community presented the results of their research. Despite the circumstances, the workshop represented an occasion for all participants from both academic institutions and private companies to disseminate their work and results and to share ideas through online sessions dedicated to each task and a general discussion during the plenary event. We carried on with the tradition of the "Best system across tasks" award. As in 2018, it represented an incentive for students, IT developers and researchers to push the boundaries of the state of the art by facing tasks in new ways, even if not winning.

Research paper thumbnail of Automatic Stopwords Identification from Very Small Corpora

Natural Language Processing tools use language-specific linguistic resources, that might be unava... more Natural Language Processing tools use language-specific linguistic resources, that might be unavailable for many languages. Since manually building them is complex, it would be desirable to learn these resources automatically from sample texts. In this paper we focus on stopwords, i.e., terms which are not relevant to understand the topic and content of a document. Specifically, we compare the performance of different techniques proposed in the literature when applied to very small corpora (even single documents), as may be the case for very local languages lacking a wide literature. Experiments show that simple term-frequency is an extremely reliable indicator, that outperforms other more complex approaches. While the study is conducted on Italian, the approach is generic and applicable to other languages.

Research paper thumbnail of UniBA @ KIPoS: A Hybrid Approach for Part-of-Speech Tagging (short paper)

English. The Part of Speech tagging operation is becoming increasingly important as it represents... more English. The Part of Speech tagging operation is becoming increasingly important as it represents the starting point for other high-level operations such as Speech Recognition, Machine Translation, Parsing and Information Retrieval. Although the accuracy of state-of-the-art POS-taggers reach a high level of accuracy (around 96-97%) it cannot yet be considered a solved problem because there are many variables to take into account. For example, most of these systems use lexical knowledge to assign a tag to unknown words. The task solution proposed in this work is based on a hybrid tagger, which doesn’t use any prior lexical knowledge, consisting of two different types of POS-taggers used sequentially: HMM tagger and RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. We trained the hybrid model using the Development set and the combination of Development and Silver sets. The results have shown an accuracy of 0,8114 and 0,8100 respectively for the main task. Italiano. L’opera...

Research paper thumbnail of EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Sp... more Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it). This volume includes the reports of both task organisers and participants to all of the EVALITA 2020 challenges. In the 2020 edition, we coordinated the organization of 14 different tasks belonging to five research areas, being: (i) Affect, Hate, and Stance, (ii) Creativity and Style, (iii) New Challenges in Long-standing Tasks, (iv) Semantics and Multimodality, Time and Diachrony. The volume is opened by an overview to the EVALITA 2020 campaign, in which we describe the tasks, provide statistics on the participants and task organizers as well as our supporting sponsors. The abstract of the keynote speech made by Preslav Nakov titled "Flattening the Curve of the COVID-19 Infodemic: These Evaluation Campaigns Can Help!" is also included in this collection. Due to the 2020 COVID-19 pandemic, the traditional workshop was held online, where several members of the Italian NLP Community presented the results of their research. Despite the circumstances, the workshop represented an occasion for all participants from both academic institutions and private companies to disseminate their work and results and to share ideas through online sessions dedicated to each task and a general discussion during the plenary event. We carried on with the tradition of the "Best system across tasks" award. As in 2018, it represented an incentive for students, IT developers and researchers to push the boundaries of the state of the art by facing tasks in new ways, even if not winning.

Research paper thumbnail of Automatic Stopwords Identification from Very Small Corpora

Natural Language Processing tools use language-specific linguistic resources, that might be unava... more Natural Language Processing tools use language-specific linguistic resources, that might be unavailable for many languages. Since manually building them is complex, it would be desirable to learn these resources automatically from sample texts. In this paper we focus on stopwords, i.e., terms which are not relevant to understand the topic and content of a document. Specifically, we compare the performance of different techniques proposed in the literature when applied to very small corpora (even single documents), as may be the case for very local languages lacking a wide literature. Experiments show that simple term-frequency is an extremely reliable indicator, that outperforms other more complex approaches. While the study is conducted on Italian, the approach is generic and applicable to other languages.

Research paper thumbnail of UniBA @ KIPoS: A Hybrid Approach for Part-of-Speech Tagging (short paper)

English. The Part of Speech tagging operation is becoming increasingly important as it represents... more English. The Part of Speech tagging operation is becoming increasingly important as it represents the starting point for other high-level operations such as Speech Recognition, Machine Translation, Parsing and Information Retrieval. Although the accuracy of state-of-the-art POS-taggers reach a high level of accuracy (around 96-97%) it cannot yet be considered a solved problem because there are many variables to take into account. For example, most of these systems use lexical knowledge to assign a tag to unknown words. The task solution proposed in this work is based on a hybrid tagger, which doesn’t use any prior lexical knowledge, consisting of two different types of POS-taggers used sequentially: HMM tagger and RDRPOSTagger [ (Nguyen et al., 2014), (Nguyen et al., 2016)]. We trained the hybrid model using the Development set and the combination of Development and Silver sets. The results have shown an accuracy of 0,8114 and 0,8100 respectively for the main task. Italiano. L’opera...