Nicoletta Calzolari Zamorani | Consiglio Nazionale delle Ricerche (CNR) (original) (raw)
Papers by Nicoletta Calzolari Zamorani
Proceedings of the Workshop on Multilingual Language Resources and Interoperability - MLRI '06, 2006
Optimizing the production, maintenance and extension of lexical resources is one the crucial aspe... more Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting Natural Language Processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that the production of a consensual specification on multilingual lexicons can be a useful aid for the various NLP actors. Within ISO, one purpose of LMF (ISO-24613) is to define a standard for lexicons that covers multilingual data.
Proceedings of the Third Linguistic Annotation Workshop on - ACL-IJCNLP '09, 2009
Two major projects in the U.S. and Europe have joined in a collaboration to work toward achieving... more Two major projects in the U.S. and Europe have joined in a collaboration to work toward achieving interoperability among language resources. In the U.S., a project entitled "Sustainable Interoperability for Language Technology" (SILT) has been funded by the National Science Foundation under the INTEROP program, and in Europe, FLaReNet Fostering Language Resources Network has been funded by the European Commission under the eContentPlus framework. This international collaborative effort involves members of the language processing community and others working in related areas to build consensus regarding the sharing of data and technologies for language resources and applications, to work towards interoperability of existing data, and, where possible, to promote standards for annotation and resource building. In addition to broad-based US and European participation, we are seeking the participation of colleagues in Asia. This presentation describing the projects and their goals will, we hope, serve to involve members of the community who may not have been aware of the effort before, in particular colleagues in Asia.
Proceedings of the Workshop on Multilingual Language Resources and Interoperability - MLRI '06, 2006
In this paper we present an application fostering the integration and interoperability of computa... more In this paper we present an application fostering the integration and interoperability of computational lexicons, focusing on the particular case of mutual linking and cross-lingual enrichment of two wordnets, the ItalWordNet and Sinica BOW lexicons. This is intended as a case-study investigating the needs and requirements of semi-automatic integration and interoperability of lexical resources.
Proceedings of the COLING/ACL on Main conference poster sessions -, 2006
As an area of great linguistic and cultural diversity, Asian language resources have received muc... more As an area of great linguistic and cultural diversity, Asian language resources have received much less attention than their western counterparts. Creating a common standard for Asian language resources that is compatible with an international standard has at least three strong advantages: to increase the competitive edge of Asian countries, to bring Asian countries to closer to their western counterparts, and to bring more cohesion among Asian countries. To achieve this goal, we have launched a two year project to create a common standard for Asian language resources. The project is comprised of four research items, (1) building a description framework of lexical entries, (2) building sample lexicons, (3) building an upperlayer ontology and (4) evaluating the proposed framework through an application. This paper outlines the project in terms of its aim and approach.
Handbook on Ontologies, 2004
Molecular biology offers a large, complex and volatile domain that tests knowledge representation... more Molecular biology offers a large, complex and volatile domain that tests knowledge representation techniques to the limit of their fidelity, precision, expressivity and adaptability. The discipline of molecular biology and bioinformatics relies greatly on the use of community knowledge, rather than laws and axioms, to further understanding, and knowledge generation. This knowledge has traditionally been kept as natural language. Given the exponential growth of already large quantities of data and associated knowledge, this is an unsustainable form of representation. This knowledge needs to be stored in a computationally amenable form and ontologies offer a mechanism for creating a shared understanding of a community for both humans and computers. Ontologies have been built and used for many domains and this chapter explores their role within bioinformatics. Structured classifications have a long history in biology; not least in the Linnean description of species. The explicit use of ontologies, however, is more recent. This chapter provides a survey of the need for ontologies; the nature of the domain and the knowledge tasks involved; and then an overview of ontology work in the discipline. The widest use of ontologies within biology is for conceptual annotationa representation of stored knowledge more computationally amenable than natural language. An ontology also offers a means to create the illusion of a common query interface over diverse, distributed information sources-here an ontology creates a shared understanding for the user and also a means to computationally reconcile heterogeneities between the resources. Ontologies also provide a means for a schema definition suitable for the complexity and precision required for biology's knowledge bases. Coming right up to date, bioinformatics is well set as an exemplar of the Semantic Web, offering both web accessible content and services conceptually marked up as a means for computational exploitation of its resources-this theme is explored through the my GRID services ontology. Ontologies in bioinformatics cover a wide range of usages and representation styles. Bioinformatics offers an exciting application area in which the community can see a real need for ontology based technology to work and deliver its promise.
Lexical Markup Framework (LMF) is a model that provides a common standardized framework for Natur... more Lexical Markup Framework (LMF) is a model that provides a common standardized framework for Natural Language Processing (NLP) lexicons. The goals of LMF are to provide a common model for the creation and use of such lexical resources, to manage the exchange of data between and among these resources, and to enable the merging of a large number of individual resources to form extensive global electronic resources.
Text, Speech and Language Technology, 1999
Lexicons, as described in the previous chapter, are a valuable resource, not only for wordclass t... more Lexicons, as described in the previous chapter, are a valuable resource, not only for wordclass tagging but also for many other applications in the broad area of language engineering (LE), which encompasses fields such as computational linguistics and Natural Language Processing (NLP). Furthermore, the last decade in particular has seen an increasing use of corpora for computational lexicography, other corpus-based research and development of applications, all of which has led to the general recognition of the value of ‘authentic’ data.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010
In this paper we analyse the role of Language Resources (LR) and Language Technologies (LT) in to... more In this paper we analyse the role of Language Resources (LR) and Language Technologies (LT) in today Human Language Technology field and try to speculate on some of the priorities for the next years, from the particular perspective of the FLaReNet project, that has been asked to act as an observatory to assess current status of the field on Language Resources and Technology and to indicate priorities of action for the future.
EURALEX'94 Proceedings, 1994
Page 1. Carol Peters Istituto di Elaborazione dell'Informazione, CNR, Pisa StefanoFe... more Page 1. Carol Peters Istituto di Elaborazione dell'Informazione, CNR, Pisa StefanoFederici Istituto di Linguistica Computazionale, CNR, Pisa Simonetta Montemagni Istituto di Linguistica Computazionale, CNR, Pisa Nicoletta ...
... word (as in the case of cliticized words, eg dammelo 'give+to_me+it'); ii) the case... more ... word (as in the case of cliticized words, eg dammelo 'give+to_me+it'); ii) the case of more than one orthographic word which make up a single morphological word not otherwise decomposable (as in the case of multi-word expressions such as ad_hoc, al_di_là 'beyond', fino_a ...
This report describes the different evaluation procedures adopted for testing effectiveness and v... more This report describes the different evaluation procedures adopted for testing effectiveness and validity of the semantic classes acquired by CLASS through analogy-based semantic similarity measures...
This paper presents a case study concerning the challenges and requirements posed by next generat... more This paper presents a case study concerning the challenges and requirements posed by next generation language resources, realized as an overall model of open, distributed and collaborative language infrastructure. If a sort of "new paradigm" for language resource sharing is required, we think that the emerging and still evolving technology connected to Grid computing is a very interesting and suitable one for a concrete realization of this vision. Given the current limitations of Grid computing, it is very important to test the new environment on basic language analysis tools, in order to get the feeling of what are the potentialities and possible limitations connected to its use in NLP. For this reason, we have done some experiments on a module of the Linguistic Miner, i.e. the extraction of linguistic patterns from restricted domain corpora. The Grid environment has produced the expected results (reduction of the processing time, huge storage capacity, data redundancy) without any additional cost for the final user.
This paper describes a Web service for accessing WordNet-type semantic lexicons. The central idea... more This paper describes a Web service for accessing WordNet-type semantic lexicons. The central idea behind the service design is: given a query, the primary functionality of lexicon access is to present a partial lexicon by extracting the relevant part of the target lexicon. Based on this idea, we implemented the system as a RESTful Web service whose input query is specified by the access URI and whose output is presented in a standardized XML data format. LMF, an ISO standard for modeling lexicons, plays the most prominent role: the access URI pattern basically reflects the lexicon structure as defined by LMF; the access results are rendered based on Wordnet-LMF, which is a version of LMF XML-serialization. The Web service currently provides accesses to Princeton WordNet, Japanese WordNet, as well as the EDR Electronic Dictionary as a trial. To accommodate the EDR dictionary within the same framework, we modeled it also as a WordNet-type semantic lexicon. This paper thus argues possible alternatives to model innately bilingual/multilingual lexicons like EDR with LMF, and proposes possible revisions to Wordnet-LMF.
This paper discusses ontologization of lexicon access functions in the context of a service-orien... more This paper discusses ontologization of lexicon access functions in the context of a service-oriented language infrastructure, such as the Language Grid. In such a language infrastructure, an access function to a lexical resource, embodied as an atomic Web service, plays a crucially important role in composing a composite Web service tailored to a user's specific requirement. To facilitate the composition process involving service discovery, planning and invocation, the language infrastructure should be ontology-based; hence the ontologization of a range of lexicon functions is highly required. In a service-oriented environment, lexical resources however can be classified from a service-oriented perspective rather than from a lexicographically motivated standard. Hence to address the issue of interoperability, the taxonomy for lexical resources should be ground to principled and shared lexicon ontology. To do this, we have ontologized the standardized lexicon modeling framework LMF, and utilized it as a foundation to stipulate the service-oriented lexicon taxonomy and the corresponding ontology for lexicon access functions. This paper also examines a possible solution to fill the gap between the ontological descriptions and the actual Web service API by adopting a W3C recommendation SAWSDL, with which Web service descriptions can be linked with the domain ontology.
The goal of this paper is to describe how the EuroWordNet framework for representing lexical mean... more The goal of this paper is to describe how the EuroWordNet framework for representing lexical meaning is being modified within an Italian National Project in order to include information on adjectives. The focus is on the 'new' semantic relations being encoded and on the revisions we have made to the EuroWordNet Top Ontology structure. We also briefly discuss the utility of the information which is being encoded for computational applications.
Proceedings of the …, 2008
This paper describes specifications which have been (or are being) developed within the Architect... more This paper describes specifications which have been (or are being) developed within the Architecture Domain of the World Wide Web Consortium. This Domain is responsible for many of the core technologies for the World Wide Web, including XML. We will describe XML-related technologies in five areas: validation, full-text analysis, declarative descriptions of XML processing, layout, and Internationalization, focusing on how they are particularly suited for the representation and processing of language resources. The paper also includes a broad overview of the standardization process which underlies the development of these and other W3C technologies.
This paper describes the design, implementation and population of a lexical resource for biology ... more This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and ...
Proceedings of the Workshop on Multilingual Language Resources and Interoperability - MLRI '06, 2006
Optimizing the production, maintenance and extension of lexical resources is one the crucial aspe... more Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting Natural Language Processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that the production of a consensual specification on multilingual lexicons can be a useful aid for the various NLP actors. Within ISO, one purpose of LMF (ISO-24613) is to define a standard for lexicons that covers multilingual data.
Proceedings of the Third Linguistic Annotation Workshop on - ACL-IJCNLP '09, 2009
Two major projects in the U.S. and Europe have joined in a collaboration to work toward achieving... more Two major projects in the U.S. and Europe have joined in a collaboration to work toward achieving interoperability among language resources. In the U.S., a project entitled "Sustainable Interoperability for Language Technology" (SILT) has been funded by the National Science Foundation under the INTEROP program, and in Europe, FLaReNet Fostering Language Resources Network has been funded by the European Commission under the eContentPlus framework. This international collaborative effort involves members of the language processing community and others working in related areas to build consensus regarding the sharing of data and technologies for language resources and applications, to work towards interoperability of existing data, and, where possible, to promote standards for annotation and resource building. In addition to broad-based US and European participation, we are seeking the participation of colleagues in Asia. This presentation describing the projects and their goals will, we hope, serve to involve members of the community who may not have been aware of the effort before, in particular colleagues in Asia.
Proceedings of the Workshop on Multilingual Language Resources and Interoperability - MLRI '06, 2006
In this paper we present an application fostering the integration and interoperability of computa... more In this paper we present an application fostering the integration and interoperability of computational lexicons, focusing on the particular case of mutual linking and cross-lingual enrichment of two wordnets, the ItalWordNet and Sinica BOW lexicons. This is intended as a case-study investigating the needs and requirements of semi-automatic integration and interoperability of lexical resources.
Proceedings of the COLING/ACL on Main conference poster sessions -, 2006
As an area of great linguistic and cultural diversity, Asian language resources have received muc... more As an area of great linguistic and cultural diversity, Asian language resources have received much less attention than their western counterparts. Creating a common standard for Asian language resources that is compatible with an international standard has at least three strong advantages: to increase the competitive edge of Asian countries, to bring Asian countries to closer to their western counterparts, and to bring more cohesion among Asian countries. To achieve this goal, we have launched a two year project to create a common standard for Asian language resources. The project is comprised of four research items, (1) building a description framework of lexical entries, (2) building sample lexicons, (3) building an upperlayer ontology and (4) evaluating the proposed framework through an application. This paper outlines the project in terms of its aim and approach.
Handbook on Ontologies, 2004
Molecular biology offers a large, complex and volatile domain that tests knowledge representation... more Molecular biology offers a large, complex and volatile domain that tests knowledge representation techniques to the limit of their fidelity, precision, expressivity and adaptability. The discipline of molecular biology and bioinformatics relies greatly on the use of community knowledge, rather than laws and axioms, to further understanding, and knowledge generation. This knowledge has traditionally been kept as natural language. Given the exponential growth of already large quantities of data and associated knowledge, this is an unsustainable form of representation. This knowledge needs to be stored in a computationally amenable form and ontologies offer a mechanism for creating a shared understanding of a community for both humans and computers. Ontologies have been built and used for many domains and this chapter explores their role within bioinformatics. Structured classifications have a long history in biology; not least in the Linnean description of species. The explicit use of ontologies, however, is more recent. This chapter provides a survey of the need for ontologies; the nature of the domain and the knowledge tasks involved; and then an overview of ontology work in the discipline. The widest use of ontologies within biology is for conceptual annotationa representation of stored knowledge more computationally amenable than natural language. An ontology also offers a means to create the illusion of a common query interface over diverse, distributed information sources-here an ontology creates a shared understanding for the user and also a means to computationally reconcile heterogeneities between the resources. Ontologies also provide a means for a schema definition suitable for the complexity and precision required for biology's knowledge bases. Coming right up to date, bioinformatics is well set as an exemplar of the Semantic Web, offering both web accessible content and services conceptually marked up as a means for computational exploitation of its resources-this theme is explored through the my GRID services ontology. Ontologies in bioinformatics cover a wide range of usages and representation styles. Bioinformatics offers an exciting application area in which the community can see a real need for ontology based technology to work and deliver its promise.
Lexical Markup Framework (LMF) is a model that provides a common standardized framework for Natur... more Lexical Markup Framework (LMF) is a model that provides a common standardized framework for Natural Language Processing (NLP) lexicons. The goals of LMF are to provide a common model for the creation and use of such lexical resources, to manage the exchange of data between and among these resources, and to enable the merging of a large number of individual resources to form extensive global electronic resources.
Text, Speech and Language Technology, 1999
Lexicons, as described in the previous chapter, are a valuable resource, not only for wordclass t... more Lexicons, as described in the previous chapter, are a valuable resource, not only for wordclass tagging but also for many other applications in the broad area of language engineering (LE), which encompasses fields such as computational linguistics and Natural Language Processing (NLP). Furthermore, the last decade in particular has seen an increasing use of corpora for computational lexicography, other corpus-based research and development of applications, all of which has led to the general recognition of the value of ‘authentic’ data.
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010
In this paper we analyse the role of Language Resources (LR) and Language Technologies (LT) in to... more In this paper we analyse the role of Language Resources (LR) and Language Technologies (LT) in today Human Language Technology field and try to speculate on some of the priorities for the next years, from the particular perspective of the FLaReNet project, that has been asked to act as an observatory to assess current status of the field on Language Resources and Technology and to indicate priorities of action for the future.
EURALEX'94 Proceedings, 1994
Page 1. Carol Peters Istituto di Elaborazione dell'Informazione, CNR, Pisa StefanoFe... more Page 1. Carol Peters Istituto di Elaborazione dell'Informazione, CNR, Pisa StefanoFederici Istituto di Linguistica Computazionale, CNR, Pisa Simonetta Montemagni Istituto di Linguistica Computazionale, CNR, Pisa Nicoletta ...
... word (as in the case of cliticized words, eg dammelo 'give+to_me+it'); ii) the case... more ... word (as in the case of cliticized words, eg dammelo 'give+to_me+it'); ii) the case of more than one orthographic word which make up a single morphological word not otherwise decomposable (as in the case of multi-word expressions such as ad_hoc, al_di_là 'beyond', fino_a ...
This report describes the different evaluation procedures adopted for testing effectiveness and v... more This report describes the different evaluation procedures adopted for testing effectiveness and validity of the semantic classes acquired by CLASS through analogy-based semantic similarity measures...
This paper presents a case study concerning the challenges and requirements posed by next generat... more This paper presents a case study concerning the challenges and requirements posed by next generation language resources, realized as an overall model of open, distributed and collaborative language infrastructure. If a sort of "new paradigm" for language resource sharing is required, we think that the emerging and still evolving technology connected to Grid computing is a very interesting and suitable one for a concrete realization of this vision. Given the current limitations of Grid computing, it is very important to test the new environment on basic language analysis tools, in order to get the feeling of what are the potentialities and possible limitations connected to its use in NLP. For this reason, we have done some experiments on a module of the Linguistic Miner, i.e. the extraction of linguistic patterns from restricted domain corpora. The Grid environment has produced the expected results (reduction of the processing time, huge storage capacity, data redundancy) without any additional cost for the final user.
This paper describes a Web service for accessing WordNet-type semantic lexicons. The central idea... more This paper describes a Web service for accessing WordNet-type semantic lexicons. The central idea behind the service design is: given a query, the primary functionality of lexicon access is to present a partial lexicon by extracting the relevant part of the target lexicon. Based on this idea, we implemented the system as a RESTful Web service whose input query is specified by the access URI and whose output is presented in a standardized XML data format. LMF, an ISO standard for modeling lexicons, plays the most prominent role: the access URI pattern basically reflects the lexicon structure as defined by LMF; the access results are rendered based on Wordnet-LMF, which is a version of LMF XML-serialization. The Web service currently provides accesses to Princeton WordNet, Japanese WordNet, as well as the EDR Electronic Dictionary as a trial. To accommodate the EDR dictionary within the same framework, we modeled it also as a WordNet-type semantic lexicon. This paper thus argues possible alternatives to model innately bilingual/multilingual lexicons like EDR with LMF, and proposes possible revisions to Wordnet-LMF.
This paper discusses ontologization of lexicon access functions in the context of a service-orien... more This paper discusses ontologization of lexicon access functions in the context of a service-oriented language infrastructure, such as the Language Grid. In such a language infrastructure, an access function to a lexical resource, embodied as an atomic Web service, plays a crucially important role in composing a composite Web service tailored to a user's specific requirement. To facilitate the composition process involving service discovery, planning and invocation, the language infrastructure should be ontology-based; hence the ontologization of a range of lexicon functions is highly required. In a service-oriented environment, lexical resources however can be classified from a service-oriented perspective rather than from a lexicographically motivated standard. Hence to address the issue of interoperability, the taxonomy for lexical resources should be ground to principled and shared lexicon ontology. To do this, we have ontologized the standardized lexicon modeling framework LMF, and utilized it as a foundation to stipulate the service-oriented lexicon taxonomy and the corresponding ontology for lexicon access functions. This paper also examines a possible solution to fill the gap between the ontological descriptions and the actual Web service API by adopting a W3C recommendation SAWSDL, with which Web service descriptions can be linked with the domain ontology.
The goal of this paper is to describe how the EuroWordNet framework for representing lexical mean... more The goal of this paper is to describe how the EuroWordNet framework for representing lexical meaning is being modified within an Italian National Project in order to include information on adjectives. The focus is on the 'new' semantic relations being encoded and on the revisions we have made to the EuroWordNet Top Ontology structure. We also briefly discuss the utility of the information which is being encoded for computational applications.
Proceedings of the …, 2008
This paper describes specifications which have been (or are being) developed within the Architect... more This paper describes specifications which have been (or are being) developed within the Architecture Domain of the World Wide Web Consortium. This Domain is responsible for many of the core technologies for the World Wide Web, including XML. We will describe XML-related technologies in five areas: validation, full-text analysis, declarative descriptions of XML processing, layout, and Internationalization, focusing on how they are particularly suited for the representation and processing of language resources. The paper also includes a broad overview of the standardization process which underlies the development of these and other W3C technologies.
This paper describes the design, implementation and population of a lexical resource for biology ... more This paper describes the design, implementation and population of a lexical resource for biology and bioinformatics (the BioLexicon) developed within an ongoing European project. The aim of this project is text-based knowledge harvesting for support to information extraction and ...
… . Challenges of the …, Jan 1, 2009
Abstract. The present paper describes a large-scale lexical resource for the biology domain desig... more Abstract. The present paper describes a large-scale lexical resource for the biology domain designed both for human and for machine use. This lexicon aims at semantic interoperability and extendability, through the adoption of ISO-LMF standard for lexical representation and ...