Adriana Roventini - Academia.edu (original) (raw)
Papers by Adriana Roventini
LEXeter '83: proceedings, 1984
Two theoretical hypotheses are in the background of the present work, and of DELIS work in genera... more Two theoretical hypotheses are in the background of the present work, and of DELIS work in general: a corpus based lexicographical approach and a frame based semantic analysis. We assume that only a careful and detailed analysis of corpus data can constitute a sound basis for a realistic approach to lexicon building, be it a human oriented dictionary or a computationally oriented lexicon. Obviously corpus data cannot be used in a simplistic way. In order to become usable they must be analysed according to some theoretical hypothesis, according to which to model and structure what would be otherwise an unstructured set of data. The best mixture of empirical and theoretical approach is the one in which the theoretical hypothesis is itself emerging from and guided by successive analyses of the data, and is cyclically refined and adjusted to textual evidence...
EURALEX94 Proceedings, 1994
The study described in this paper was carried out within the framework of the LRE-DELIS project, ... more The study described in this paper was carried out within the framework of the LRE-DELIS project, which proposes a corpus-based lexicographical approach and frame-based semantic theory in dictionary construction. In particular, this method is applied here to a subset of the Italian perception verbs sentire, udire, and ascoltare representing the lexical field of audition. Frame semantics and a particular descriptive strategy are used to structure the data and to correlate syntactic/morphological and semantic properties. This has allowed us to consistently describe differences and similarities in meaning between the three verbs and to integrate the lexical information already contained in dictionaries, in order to contribute to the design of more complete and detailed lexical entries, for both human-oriented dictionaries and computational lexicons.
Meeting of the Association for Computational Linguistics, 2007
This paper describes a work in progress aiming at linking the two largest Italian lexical-semanti... more This paper describes a work in progress aiming at linking the two largest Italian lexical-semantic databases ItalWordNet and PAROLE-SIMPLE-CLIPS. The adopted linking methodology, the software tool devised and implemented for this purpose and the results of the first mapping phase regarding 1stOrderEntities are illustrated here.
ItalWordNet (IWN) is a lexical-semantic database developed in the framework of two different rese... more ItalWordNet (IWN) is a lexical-semantic database developed in the framework of two different research projects: EuroWordNet (EWN) and Sistema Integrato per il Trattamento Automatico del Linguaggio (SI-TAL). IWN is structured in the same way as the Princeton WordNet, namely around the notion of synset. Following the model designed in EWN, IWN encodes a rich set of semantic relations. In addition to the internal language relations, equivalence relations were also encoded between Italian synsets and the closest concepts in an Inter-Lingual Index (ILI), a separate language-independent module containing all WN1.5 synsets but not the relations among them. IWN now contains information about Italian Nouns, Verbs, Adjectives and Adverbs. This SQL version of IWN v2.0 contains a corrected and revised version of the original IWN: 49350 Synsets (of which: 3459 proper nouns, 32073 nominal, 8903 verbal, 4374 adjectival, 541 adverbial) 48416 Lemmas (of which: 3918 proper nouns, 29527 nouns, 8015 ve...
WordNet, the on-line English thesaurus and lexical database developed at Princeton University by ... more WordNet, the on-line English thesaurus and lexical database developed at Princeton University by George Miller and his colleagues (Fellbaum 1998), has proved to be an extremely important resource used in much research in computational linguistics where lexical knowledge of English is required. The goal of the EuroWordNet project is to create similar wordnets for other languages of Europe. The initial four languages are Dutch (at the University of Amsterdam), Italian (CNR, Pisa), Spanish (Fundacion Universidad Empresa), and English (University of Sheffield, adapting the original WordNet); later Czech, Estonian, German, and French will be added. The results of the project will be publicly available. 1 Like the original Princeton WordNet, the new wordnets--that 's now a generic term--are hierarchies in which each node is a synset: a word sense, with which one or more synonymous words or phrases is associated. The synsets are connected by relations such as hyponymy, meronymy, and an...
This paper presents a few initial results of a study which is being carried out at the Institute ... more This paper presents a few initial results of a study which is being carried out at the Institute of Com-putational Linguistics in Pisa, fitting into the fra-mework of computer-based approaches to literary text analysis. The object of this investigation is Palomar, a work of the well-known Italian writer Italo Calvino. This book, published in 1983, is one of Calvino's later works and is very representative of the author's stylistic and thematic aspects. The paper focuses on the lexical structure and the particular use of the different categories on the syntagmatic level (i.e. the way they combine), as well as the computer tools used to exploit this information in a consistent and exhaustive way.
LREC Proceedings, CDROM, 2006
In the field of Computational Linguistics, many lexical resources have been developed which aim a... more In the field of Computational Linguistics, many lexical resources have been developed which aim at encoding complex lexical semantic information according to different linguistic models (WordNet, Frame Semantics, Generative Lexicon, etc.). However, these resources are often not easily accessible nor available in their entirety. Yet, from the point of view of the continuous growth of technology (Semantic Web), their visibility, availability and integration are becoming of utmost importance. ItalWordNet and PAROLE/SIMPLE/CLIPS are two resources which, tackling lexical semantics from different perspectives and being at least partially complementary, can profit from linking each other. In this paper we address the issue of the linking of these resources focusing on the most problematic part of the lexicon: the second order entities. In particular, after a brief description of the two resources, their different approaches to the verb semantics are described; an accurate comparison of a set of verbal entries belonging to Speech Act semantic class is carried out aiming at evaluate the possibilities and the advantages of a semiautomatic link.
Piek Vossen, University of Amsterdam Salvador Climent, Maria Antonia Marti, Mariona Taule, Univer... more Piek Vossen, University of Amsterdam Salvador Climent, Maria Antonia Marti, Mariona Taule, Universitat de Barcelona Julio Gonzalo, Irina Chugur, M. Felisa Verdejo, UNED Gerard Escudero, German Rigau, Horacio Rodriguez, Universitat Politecnica de Catalunya Antonietta Alonge, Francesca Bertagna, Rita Marinelli, Adriana Roventini, Luca Tarasi, Istituto di Linguistica del CNR, Pisa ... Deliverable D029, D030, WP3, WP4 EuroWordNet, LE2-4003 ... Title Comparison of the Final Wordnets Dutch, Spanish and Italian ... Authors Ö Piek Vossen, Laura ...
This deliverable describes the First Subset for Nouns and Verbs in Dutch, Italian, Spanish and En... more This deliverable describes the First Subset for Nouns and Verbs in Dutch, Italian, Spanish and English. These First Subsets represent the cores of the wordnets: including the most important meanings on which the order meanings depend. The data are described in terms of tables that specify the synsets, entries, senses and relations, and by comparison with the top ontology distribution and the Parole lexicons. Furthermore, we have carried out two comparisons of the fragments. An in-depth comparison has been carried out for 18 semantic clusters, using the Polaris tool. An overall comparison has been carried out using a graph-matching toolkit developed by FUE. Finally, this deliverable describes the work done for updating the Inter-Lingual-Index (ILI) that interconnects the different wordnets. The conclusions of the overviews and comparison are being used to guide the final building phase in Euro-WordNet. TEL:: +39 50 560481
Disclaimer/Klachtenregeling Meent u dat de digitale beschikbaarstelling van bepaald materiaal inb... more Disclaimer/Klachtenregeling Meent u dat de digitale beschikbaarstelling van bepaald materiaal inbreuk maakt op enig recht dat u toekomt of uw (privacy)belangen schaadt, dan kunt u dit onderbouwd aan de Universiteitsbibliotheek laten weten. Bij een gegronde klacht zal de Universiteitsbibliotheek het materiaal ontoegankelijk maken en/of van de website verwijderen, dan wel samen met u bekijken hoe op een andere manier aan uw klacht tegemoet kan worden gekomen. Stuurt u hiervoor een e-mail naar: dare@uva.nl, of een brief naar: Bibliotheek van de Universiteit ...
In this paper we describe the creation, we are carrying out of a special- ized lexicon belonging ... more In this paper we describe the creation, we are carrying out of a special- ized lexicon belonging to the maritime domain (including the technical and commer- cial/maritime transport domain) and the link of this lexicon to the generic one of the ItalWordNet lexical database. The main characteristics of the lexical semantic database and the specic features of the specialized language are described together with the coding performed according to the ItalWordNet semantic relations model and the ap- proach adopted to connect the terminological database to the generic one. Some of the problems encountered and a few expected advantages are also considered.
In the last years, at the Institute for Computational Linguistics in Pisa, a few lexical resource... more In the last years, at the Institute for Computational Linguistics in Pisa, a few lexical resources have been developed aiming at encoding complex lexical semantic information. ItalWordNet and SIMPLE are two of these resources which, tackling semantics in the lexicon from different points of view, and being at least partiall y complementary, could certainly profit from linking each other. These resources in fact evidence different aspects of the lexical information: in SIMPLE, which adds a semantic layer to the morphological and syntactic ones developed in PAROLE, the connections between semantics and syntax are preeminent; ItalWordNet (as the Princeton WordNet and then EuroWordNet) is built around the basic notion of a synset and various semantic relations are encoded between synsets while syntactic aspects are not taken into consideration. In the paper we describe an experiment we carried out, aimed at exploring the feasibility of li nking these lexical resources, being convinced t...
This paper refers lo a study carried out at the Institute for Computational Linguistics in Pisa a... more This paper refers lo a study carried out at the Institute for Computational Linguistics in Pisa as part of a project concerning the Italian Machine Dictionary ( D M I ) as a lex ical database. Within the field of computational linguistics there is an increasing need to re present the lexical entry in a consistent and exhaustive way, so that all the syntactic and semantic implications of the lemma are made explicit. In particular a lexical entry, for example of a verb, can be considered exhaustive if it is represented not only with ils syntactic but also semantic features which must be given a unitary represen tation whenever possible. Over the last years a systematic study of the lexicon has shown that classes of verbs with homogeneous meanings often have a similar beha viour at the syntactic level. Furthermore, the morphological links which exist in a lan guage are all strictly connected with syntactic and semantic attributes. They represent different structuring levels within...
The present paper fits into the framework of methodologies for corpus-based dictionary building, ... more The present paper fits into the framework of methodologies for corpus-based dictionary building, describing some of the results acquired within the LRE-DELIS Project from the application of an explicitly defined standardized procedure of corpus exploration. Emphasis is given to those aspects which seem to play a crucial role in monoand multi-lingual lexicography, since they are linked to sense disambiguation. In particular, the general characteristics associated with the two broad classes of perception and speech act verbs and the specific properties related to individual verbs are dealt with, together with considerations on their representation in a hierarchically structured formal language. The relevance of phenomena such as the interaction between different levels of information (morphosyntactic, syntactic, semantic), coindexation, symmetry, typical modifiers, are discussed not only from a mono lingual point of view, but also from a multi-lingual and contrastive perspective, as ...
The approach being followed to build a semantic database or wordnet for Italian within the framew... more The approach being followed to build a semantic database or wordnet for Italian within the framework of the EuroWordNet project is discussed. The emphasis is on the strategies employed to ensure that the monolingual database is linguistically coherent while, at the same time, guaranteeing compatibility with the other components of the project. The paper is divided into two main sections in which we deal with the monolingual and multilingual aspects of the work respectively. In the first part we describe the construction of the core entities of the Italian wordnet the synsets and the difficulties encountered when building coherent linguistic/semantic taxonomies. The second part will briefly present the problems faced and the methodology being adopted for a semi-automatic mapping of the Italian lexical data to the Interlingual Index of Euro Wordnet
In this paper we discuss how the ItalWordNet semantic database, being built by extending the Ital... more In this paper we discuss how the ItalWordNet semantic database, being built by extending the Italian wordnet developed within the EuroWordNet project, is being exploited for the lexical semantic annotation of a corpus of Italian.
LEXeter '83: proceedings, 1984
Two theoretical hypotheses are in the background of the present work, and of DELIS work in genera... more Two theoretical hypotheses are in the background of the present work, and of DELIS work in general: a corpus based lexicographical approach and a frame based semantic analysis. We assume that only a careful and detailed analysis of corpus data can constitute a sound basis for a realistic approach to lexicon building, be it a human oriented dictionary or a computationally oriented lexicon. Obviously corpus data cannot be used in a simplistic way. In order to become usable they must be analysed according to some theoretical hypothesis, according to which to model and structure what would be otherwise an unstructured set of data. The best mixture of empirical and theoretical approach is the one in which the theoretical hypothesis is itself emerging from and guided by successive analyses of the data, and is cyclically refined and adjusted to textual evidence...
EURALEX94 Proceedings, 1994
The study described in this paper was carried out within the framework of the LRE-DELIS project, ... more The study described in this paper was carried out within the framework of the LRE-DELIS project, which proposes a corpus-based lexicographical approach and frame-based semantic theory in dictionary construction. In particular, this method is applied here to a subset of the Italian perception verbs sentire, udire, and ascoltare representing the lexical field of audition. Frame semantics and a particular descriptive strategy are used to structure the data and to correlate syntactic/morphological and semantic properties. This has allowed us to consistently describe differences and similarities in meaning between the three verbs and to integrate the lexical information already contained in dictionaries, in order to contribute to the design of more complete and detailed lexical entries, for both human-oriented dictionaries and computational lexicons.
Meeting of the Association for Computational Linguistics, 2007
This paper describes a work in progress aiming at linking the two largest Italian lexical-semanti... more This paper describes a work in progress aiming at linking the two largest Italian lexical-semantic databases ItalWordNet and PAROLE-SIMPLE-CLIPS. The adopted linking methodology, the software tool devised and implemented for this purpose and the results of the first mapping phase regarding 1stOrderEntities are illustrated here.
ItalWordNet (IWN) is a lexical-semantic database developed in the framework of two different rese... more ItalWordNet (IWN) is a lexical-semantic database developed in the framework of two different research projects: EuroWordNet (EWN) and Sistema Integrato per il Trattamento Automatico del Linguaggio (SI-TAL). IWN is structured in the same way as the Princeton WordNet, namely around the notion of synset. Following the model designed in EWN, IWN encodes a rich set of semantic relations. In addition to the internal language relations, equivalence relations were also encoded between Italian synsets and the closest concepts in an Inter-Lingual Index (ILI), a separate language-independent module containing all WN1.5 synsets but not the relations among them. IWN now contains information about Italian Nouns, Verbs, Adjectives and Adverbs. This SQL version of IWN v2.0 contains a corrected and revised version of the original IWN: 49350 Synsets (of which: 3459 proper nouns, 32073 nominal, 8903 verbal, 4374 adjectival, 541 adverbial) 48416 Lemmas (of which: 3918 proper nouns, 29527 nouns, 8015 ve...
WordNet, the on-line English thesaurus and lexical database developed at Princeton University by ... more WordNet, the on-line English thesaurus and lexical database developed at Princeton University by George Miller and his colleagues (Fellbaum 1998), has proved to be an extremely important resource used in much research in computational linguistics where lexical knowledge of English is required. The goal of the EuroWordNet project is to create similar wordnets for other languages of Europe. The initial four languages are Dutch (at the University of Amsterdam), Italian (CNR, Pisa), Spanish (Fundacion Universidad Empresa), and English (University of Sheffield, adapting the original WordNet); later Czech, Estonian, German, and French will be added. The results of the project will be publicly available. 1 Like the original Princeton WordNet, the new wordnets--that 's now a generic term--are hierarchies in which each node is a synset: a word sense, with which one or more synonymous words or phrases is associated. The synsets are connected by relations such as hyponymy, meronymy, and an...
This paper presents a few initial results of a study which is being carried out at the Institute ... more This paper presents a few initial results of a study which is being carried out at the Institute of Com-putational Linguistics in Pisa, fitting into the fra-mework of computer-based approaches to literary text analysis. The object of this investigation is Palomar, a work of the well-known Italian writer Italo Calvino. This book, published in 1983, is one of Calvino's later works and is very representative of the author's stylistic and thematic aspects. The paper focuses on the lexical structure and the particular use of the different categories on the syntagmatic level (i.e. the way they combine), as well as the computer tools used to exploit this information in a consistent and exhaustive way.
LREC Proceedings, CDROM, 2006
In the field of Computational Linguistics, many lexical resources have been developed which aim a... more In the field of Computational Linguistics, many lexical resources have been developed which aim at encoding complex lexical semantic information according to different linguistic models (WordNet, Frame Semantics, Generative Lexicon, etc.). However, these resources are often not easily accessible nor available in their entirety. Yet, from the point of view of the continuous growth of technology (Semantic Web), their visibility, availability and integration are becoming of utmost importance. ItalWordNet and PAROLE/SIMPLE/CLIPS are two resources which, tackling lexical semantics from different perspectives and being at least partially complementary, can profit from linking each other. In this paper we address the issue of the linking of these resources focusing on the most problematic part of the lexicon: the second order entities. In particular, after a brief description of the two resources, their different approaches to the verb semantics are described; an accurate comparison of a set of verbal entries belonging to Speech Act semantic class is carried out aiming at evaluate the possibilities and the advantages of a semiautomatic link.
Piek Vossen, University of Amsterdam Salvador Climent, Maria Antonia Marti, Mariona Taule, Univer... more Piek Vossen, University of Amsterdam Salvador Climent, Maria Antonia Marti, Mariona Taule, Universitat de Barcelona Julio Gonzalo, Irina Chugur, M. Felisa Verdejo, UNED Gerard Escudero, German Rigau, Horacio Rodriguez, Universitat Politecnica de Catalunya Antonietta Alonge, Francesca Bertagna, Rita Marinelli, Adriana Roventini, Luca Tarasi, Istituto di Linguistica del CNR, Pisa ... Deliverable D029, D030, WP3, WP4 EuroWordNet, LE2-4003 ... Title Comparison of the Final Wordnets Dutch, Spanish and Italian ... Authors Ö Piek Vossen, Laura ...
This deliverable describes the First Subset for Nouns and Verbs in Dutch, Italian, Spanish and En... more This deliverable describes the First Subset for Nouns and Verbs in Dutch, Italian, Spanish and English. These First Subsets represent the cores of the wordnets: including the most important meanings on which the order meanings depend. The data are described in terms of tables that specify the synsets, entries, senses and relations, and by comparison with the top ontology distribution and the Parole lexicons. Furthermore, we have carried out two comparisons of the fragments. An in-depth comparison has been carried out for 18 semantic clusters, using the Polaris tool. An overall comparison has been carried out using a graph-matching toolkit developed by FUE. Finally, this deliverable describes the work done for updating the Inter-Lingual-Index (ILI) that interconnects the different wordnets. The conclusions of the overviews and comparison are being used to guide the final building phase in Euro-WordNet. TEL:: +39 50 560481
Disclaimer/Klachtenregeling Meent u dat de digitale beschikbaarstelling van bepaald materiaal inb... more Disclaimer/Klachtenregeling Meent u dat de digitale beschikbaarstelling van bepaald materiaal inbreuk maakt op enig recht dat u toekomt of uw (privacy)belangen schaadt, dan kunt u dit onderbouwd aan de Universiteitsbibliotheek laten weten. Bij een gegronde klacht zal de Universiteitsbibliotheek het materiaal ontoegankelijk maken en/of van de website verwijderen, dan wel samen met u bekijken hoe op een andere manier aan uw klacht tegemoet kan worden gekomen. Stuurt u hiervoor een e-mail naar: dare@uva.nl, of een brief naar: Bibliotheek van de Universiteit ...
In this paper we describe the creation, we are carrying out of a special- ized lexicon belonging ... more In this paper we describe the creation, we are carrying out of a special- ized lexicon belonging to the maritime domain (including the technical and commer- cial/maritime transport domain) and the link of this lexicon to the generic one of the ItalWordNet lexical database. The main characteristics of the lexical semantic database and the specic features of the specialized language are described together with the coding performed according to the ItalWordNet semantic relations model and the ap- proach adopted to connect the terminological database to the generic one. Some of the problems encountered and a few expected advantages are also considered.
In the last years, at the Institute for Computational Linguistics in Pisa, a few lexical resource... more In the last years, at the Institute for Computational Linguistics in Pisa, a few lexical resources have been developed aiming at encoding complex lexical semantic information. ItalWordNet and SIMPLE are two of these resources which, tackling semantics in the lexicon from different points of view, and being at least partiall y complementary, could certainly profit from linking each other. These resources in fact evidence different aspects of the lexical information: in SIMPLE, which adds a semantic layer to the morphological and syntactic ones developed in PAROLE, the connections between semantics and syntax are preeminent; ItalWordNet (as the Princeton WordNet and then EuroWordNet) is built around the basic notion of a synset and various semantic relations are encoded between synsets while syntactic aspects are not taken into consideration. In the paper we describe an experiment we carried out, aimed at exploring the feasibility of li nking these lexical resources, being convinced t...
This paper refers lo a study carried out at the Institute for Computational Linguistics in Pisa a... more This paper refers lo a study carried out at the Institute for Computational Linguistics in Pisa as part of a project concerning the Italian Machine Dictionary ( D M I ) as a lex ical database. Within the field of computational linguistics there is an increasing need to re present the lexical entry in a consistent and exhaustive way, so that all the syntactic and semantic implications of the lemma are made explicit. In particular a lexical entry, for example of a verb, can be considered exhaustive if it is represented not only with ils syntactic but also semantic features which must be given a unitary represen tation whenever possible. Over the last years a systematic study of the lexicon has shown that classes of verbs with homogeneous meanings often have a similar beha viour at the syntactic level. Furthermore, the morphological links which exist in a lan guage are all strictly connected with syntactic and semantic attributes. They represent different structuring levels within...
The present paper fits into the framework of methodologies for corpus-based dictionary building, ... more The present paper fits into the framework of methodologies for corpus-based dictionary building, describing some of the results acquired within the LRE-DELIS Project from the application of an explicitly defined standardized procedure of corpus exploration. Emphasis is given to those aspects which seem to play a crucial role in monoand multi-lingual lexicography, since they are linked to sense disambiguation. In particular, the general characteristics associated with the two broad classes of perception and speech act verbs and the specific properties related to individual verbs are dealt with, together with considerations on their representation in a hierarchically structured formal language. The relevance of phenomena such as the interaction between different levels of information (morphosyntactic, syntactic, semantic), coindexation, symmetry, typical modifiers, are discussed not only from a mono lingual point of view, but also from a multi-lingual and contrastive perspective, as ...
The approach being followed to build a semantic database or wordnet for Italian within the framew... more The approach being followed to build a semantic database or wordnet for Italian within the framework of the EuroWordNet project is discussed. The emphasis is on the strategies employed to ensure that the monolingual database is linguistically coherent while, at the same time, guaranteeing compatibility with the other components of the project. The paper is divided into two main sections in which we deal with the monolingual and multilingual aspects of the work respectively. In the first part we describe the construction of the core entities of the Italian wordnet the synsets and the difficulties encountered when building coherent linguistic/semantic taxonomies. The second part will briefly present the problems faced and the methodology being adopted for a semi-automatic mapping of the Italian lexical data to the Interlingual Index of Euro Wordnet
In this paper we discuss how the ItalWordNet semantic database, being built by extending the Ital... more In this paper we discuss how the ItalWordNet semantic database, being built by extending the Italian wordnet developed within the EuroWordNet project, is being exploited for the lexical semantic annotation of a corpus of Italian.