Cristina Giannone - Academia.edu (original) (raw)
Papers by Cristina Giannone
Springer eBooks, 2013
In this paper, we present a Semantic Role Labeling tool for Italian language for the FLaIT compet... more In this paper, we present a Semantic Role Labeling tool for Italian language for the FLaIT competition at Evalita 2011. This tool presents an hybrid approach to resolve the different sub-tasks that composed the SRL task. We apply a discriminative model for the boundary detection task based on lexical and syntactical features. A distributional approach to modeling lexical semantic information, instead, for the Argument Classification sub-task is applied in a semi-supervised perspective. The combination of these models achieved interesting results in the FLaIT competition.
The connection is indispensable to the expression of thought. Without the connection, we would no... more The connection is indispensable to the expression of thought. Without the connection, we would no be able to express any contiguous thought and we could only list a succession of images and ideas isolated from each other and without any link between them.
All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
This paper aims at describing, from an industrial perspective, the experience in delivering conve... more This paper aims at describing, from an industrial perspective, the experience in delivering conversational agents via the development of Iride, a platform able to deploy multi-language task-oriented dialog systems. It has been implemented a set of functionalities that can be aggregated in different ways, in order to build domain independent conversational systems, which are able to satisfy needs of real business cases. Along with algorithms and techniques for end to end Dialog management, such as Natural Language Understanding (NLU), Question Answering (QA) and Dialog State tracking and policy management, the technical insights leveraged into the platform are described by outlining the requirements and constraints emerging from these on the field experiences.1
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020, 2020
Transfer learning has been proven to be effective, especially when data for the target domain/tas... more Transfer learning has been proven to be effective, especially when data for the target domain/task is scarce. Sometimes data for a similar task is only available in another language because it may be very specific. In this paper, we explore the use of machine-translated data to transfer models on a related domain. Specifically, we transfer models from the question duplication task (QDT) to similar FAQ selection tasks. The source domain is the wellknown English Quora dataset, while the target domain is a collection of small Italian datasets for real case scenarios consisting of FAQ groups retrieved by pivoting on common answers. Our results show great improvements in the zero-shot learning setting and modest improvements using the standard transfer approach for direct in-domain adaptation 1 .
International Journal on Document Analysis and Recognition (IJDAR), 2010
Lecture Notes in Computer Science, 2009
In a specific process of business intelligence, i.e. investigation on organized crime, empirical ... more In a specific process of business intelligence, i.e. investigation on organized crime, empirical language processing technologies can play a crucial role. The analysis of transcriptions on investigative activities, such as police interrogatories, for the recognition and storage of complex relations among people and locations is a very difficult and time consuming task, ultimately based on pools of experts. We discuss here an inductive relation extraction platform that opens the way to much cheaper and consistent workflows. The presented empirical investigation shows that accurate results, comparable to the expert teams, can be achieved, and parametrization allows to fine tune the system behavior for fitting domain-specific requirements.
disi.unitn.it
Abstract. In a specific process of business intelligence, ie investiga-tion on organised crime, e... more Abstract. In a specific process of business intelligence, ie investiga-tion on organised crime, empirical language processing technologies can play a crucial role. The analysis of transcriptions on investigative ac-tivities, such as police interrogatory, for the recognition and storage of ...
Poster and Demo
Frame-based Ontology Learning for Information Extraction Diego De Cao, Cristina Giannone, and Rob... more Frame-based Ontology Learning for Information Extraction Diego De Cao, Cristina Giannone, and Roberto Basili University of Roma Tor Vergata, Italy, email:{decao, giannone, basili}@ info. uniroma2. it Abstract. In this paper, an ontology learning platform, called ...
Lecture Notes in Computer Science, 2013
In this paper, we present a Semantic Role Labeling tool for Italian language for the FLaIT compet... more In this paper, we present a Semantic Role Labeling tool for Italian language for the FLaIT competition at Evalita 2011. This tool presents an hybrid approach to resolve the different sub-tasks that composed the SRL task. We apply a discriminative model for the boundary detection task based on lexical and syntactical features. A distributional approach to modeling lexical semantic information, instead, for the Argument Classification sub-task is applied in a semi-supervised perspective. The combination of these models achieved interesting results in the FLaIT competition.
In this paper, we present a QA system enabling NL questions against Linked Data, designed and ado... more In this paper, we present a QA system enabling NL questions against Linked Data, designed and adopted by the Tor Vergata University AI group in the QALD-3 evaluation. The system integrates lexical semantic modeling and statistical inference within a complex architecture that decomposes the NL interpretation task into a cascade of three different stages: (1) The selection of key ontological information from the question (i.e. predicate, arguments and properties), (2) the location of such salient information in the ontology through the joint disambiguation of the different candidates and (3) the compilation of the final SPARQL query. This architecture characterizes a novel approach for the task and exploits a graphical model (i.e. an Hidden Markov Model) to select the proper ontological triples according to the graph nature of RDF. In particular, for each query an HMM model is produced whose Viterbi solution is the comprehensive joint disambiguation across the sentence elements. The combination of these approaches achieved interesting results in the QALD competition. The RTV is in fact within the group of participants performing slightly below the best system, but with smaller requirements and on significantly poorer input information.
Sequence-to-sequence neural networks are redesigning dialog managers for Conversational AI in ind... more Sequence-to-sequence neural networks are redesigning dialog managers for Conversational AI in industries. However, industrial applications impose two important constraints: training data are often scarce and the behavior of dialog managers should be strictly controlled and certified. In this paper, we propose the Conversational Logic Injected Neural Network (CLINN). This novel network merges dialog managers “programmed” using logical rules and a Sequenceto-Sequence Neural Network. We experimented with the Restaurant topic of the MultiWOZ dataset. Results show that injected rules are effective when training data set are scarce as well as when more data are available.
ArXiv, 2021
Incorporating explicit domain knowledge into neural-based task-oriented dialogue systems is an ef... more Incorporating explicit domain knowledge into neural-based task-oriented dialogue systems is an effective way to reduce the need of large sets of annotated dialogues. In this paper, we investigate how the use of explicit domain knowledge of conversational designers affects the performance of neural-based dialogue systems. To support this investigation, we propose the Conversational-LogicInjection-in-Neural-Network system (CLINN) where explicit knowledge is coded in semilogical rules. By using CLINN, we evaluated semi-logical rules produced by a team of differently-skilled conversational designers. We experimented with the Restaurant topic of the MultiWOZ dataset. Results show that external knowledge is extremely important for reducing the need of annotated examples for conversational systems. In fact, rules from conversational designers used in CLINN significantly outperform a state-of-the-art neural-based dialogue system.
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data - AND '09, 2009
In this paper, we present a system for the generation of cultural itineraries that exploits conve... more In this paper, we present a system for the generation of cultural itineraries that exploits conversational agents to implicitly build formal user profiles. The key idea is that the preferences for user profiling are not obtained in a direct way, but acquired during a natural language conversation of the tourists with the system. When the user profile is ready, it becomes the input for the generation of the customized cultural itinerary. The proposed system, called DiEM System 3 , is designed for dialogues in the domain of cultural heritage, but its flexible architecture allows to customize the dialogues in different application domains (cinema, finance, medicine, etc.).
Topic Models like Latent Dirichlet Allocation have been widely used for their robustness in estim... more Topic Models like Latent Dirichlet Allocation have been widely used for their robustness in estimating text models through mixtures of latent topics. Although LDA has been mostly used as a strictly lexicalized approach, it can be effectively applicable to a much richer set of linguistic structures. A novel application of LDA is here presented that acquires suitable grammatical generalizations for semantic tasks tightly dependent on NL syntax. We show how the resulting topics represent suitable generalizations over syntactic structures and lexical information as well. The evaluation on two different classification tasks, such as predicate recognition and question classification, shows that state of the art results are obtained.
Museums need to find innovative ways of communicating if these institutions want to survive in th... more Museums need to find innovative ways of communicating if these institutions want to survive in the new era and want to play their active role of educators. In this paper, we will present our idea of living artworks. Using conversational agents we want to give artworks the capability of talking to visitors. A living artwork attracts attention, being a funny and novel combination of art and technology. The mix of experience and action has a beneficial effect in learning new concepts or facts. We will then present our methodology for building living artworks, the enabling technologies, and a case study.
International Journal on Document Analysis and Recognition (IJDAR), 2010
In this paper, we present models for mining text relations between named entities, which can deal... more In this paper, we present models for mining text relations between named entities, which can deal with data highly affected by linguistic noise. Our models are made robust by: (a) the exploitation of state-of-the-art statistical algorithms such as support vector machines (SVMs) along with effective and versatile pattern mining methods, e.g. word sequence kernels; (b) the design of specific features capable of capturing long distance relationships; and (c) the use of domain prior knowledge in the form of ontological constraints, e.g. bounds on the type of relation arguments given by the semantic categories of the involved entities. This property allows for keeping small the training data required by SVMs and consequently lowering the system design costs. We empirically tested our hybrid model in the very complex domain of business intelligence, where the textual data are constituted by reports on investigations into criminal enterprises based on police interrogatory reports, electronic eavesdropping and wiretaps. The target relations are typically established between entities, as they are mentioned in these information sources. The experiments on mining such relations show that our approach with small training data C. Giannone (B) · P. Naggar CM Sistemi s.p.is robust to non-conventional languages as dialects, jargon expressions or coded words typically contained in such text.
Lecture Notes in Computer Science, 2007
This paper proposes a model for ontological representation supporting task-oriented dialog. The a... more This paper proposes a model for ontological representation supporting task-oriented dialog. The adoption of our ontology representation allows to map an interactive Question Answering (iQA) task into a knowledge based process. It supports dialog control, speech act recognition, planning and natural language generation through a unified knowledge model. A platform for developing iQA systems in specific domains, called REQUIRE (Robust Empirical QUestion answering for Intelligent Retrieval ), has been entirely developed over this model. The first prototype developed for medical consulting in the sexual health domain has been recently deployed and is currently under testing. This will serve as a basis for exemplifying the model and discussing its benefits.
Springer eBooks, 2013
In this paper, we present a Semantic Role Labeling tool for Italian language for the FLaIT compet... more In this paper, we present a Semantic Role Labeling tool for Italian language for the FLaIT competition at Evalita 2011. This tool presents an hybrid approach to resolve the different sub-tasks that composed the SRL task. We apply a discriminative model for the boundary detection task based on lexical and syntactical features. A distributional approach to modeling lexical semantic information, instead, for the Argument Classification sub-task is applied in a semi-supervised perspective. The combination of these models achieved interesting results in the FLaIT competition.
The connection is indispensable to the expression of thought. Without the connection, we would no... more The connection is indispensable to the expression of thought. Without the connection, we would no be able to express any contiguous thought and we could only list a succession of images and ideas isolated from each other and without any link between them.
All in-text references underlined in blue are linked to publications on ResearchGate, letting you... more All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.
This paper aims at describing, from an industrial perspective, the experience in delivering conve... more This paper aims at describing, from an industrial perspective, the experience in delivering conversational agents via the development of Iride, a platform able to deploy multi-language task-oriented dialog systems. It has been implemented a set of functionalities that can be aggregated in different ways, in order to build domain independent conversational systems, which are able to satisfy needs of real business cases. Along with algorithms and techniques for end to end Dialog management, such as Natural Language Understanding (NLU), Question Answering (QA) and Dialog State tracking and policy management, the technical insights leveraged into the platform are described by outlining the requirements and constraints emerging from these on the field experiences.1
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020, 2020
Transfer learning has been proven to be effective, especially when data for the target domain/tas... more Transfer learning has been proven to be effective, especially when data for the target domain/task is scarce. Sometimes data for a similar task is only available in another language because it may be very specific. In this paper, we explore the use of machine-translated data to transfer models on a related domain. Specifically, we transfer models from the question duplication task (QDT) to similar FAQ selection tasks. The source domain is the wellknown English Quora dataset, while the target domain is a collection of small Italian datasets for real case scenarios consisting of FAQ groups retrieved by pivoting on common answers. Our results show great improvements in the zero-shot learning setting and modest improvements using the standard transfer approach for direct in-domain adaptation 1 .
International Journal on Document Analysis and Recognition (IJDAR), 2010
Lecture Notes in Computer Science, 2009
In a specific process of business intelligence, i.e. investigation on organized crime, empirical ... more In a specific process of business intelligence, i.e. investigation on organized crime, empirical language processing technologies can play a crucial role. The analysis of transcriptions on investigative activities, such as police interrogatories, for the recognition and storage of complex relations among people and locations is a very difficult and time consuming task, ultimately based on pools of experts. We discuss here an inductive relation extraction platform that opens the way to much cheaper and consistent workflows. The presented empirical investigation shows that accurate results, comparable to the expert teams, can be achieved, and parametrization allows to fine tune the system behavior for fitting domain-specific requirements.
disi.unitn.it
Abstract. In a specific process of business intelligence, ie investiga-tion on organised crime, e... more Abstract. In a specific process of business intelligence, ie investiga-tion on organised crime, empirical language processing technologies can play a crucial role. The analysis of transcriptions on investigative ac-tivities, such as police interrogatory, for the recognition and storage of ...
Poster and Demo
Frame-based Ontology Learning for Information Extraction Diego De Cao, Cristina Giannone, and Rob... more Frame-based Ontology Learning for Information Extraction Diego De Cao, Cristina Giannone, and Roberto Basili University of Roma Tor Vergata, Italy, email:{decao, giannone, basili}@ info. uniroma2. it Abstract. In this paper, an ontology learning platform, called ...
Lecture Notes in Computer Science, 2013
In this paper, we present a Semantic Role Labeling tool for Italian language for the FLaIT compet... more In this paper, we present a Semantic Role Labeling tool for Italian language for the FLaIT competition at Evalita 2011. This tool presents an hybrid approach to resolve the different sub-tasks that composed the SRL task. We apply a discriminative model for the boundary detection task based on lexical and syntactical features. A distributional approach to modeling lexical semantic information, instead, for the Argument Classification sub-task is applied in a semi-supervised perspective. The combination of these models achieved interesting results in the FLaIT competition.
In this paper, we present a QA system enabling NL questions against Linked Data, designed and ado... more In this paper, we present a QA system enabling NL questions against Linked Data, designed and adopted by the Tor Vergata University AI group in the QALD-3 evaluation. The system integrates lexical semantic modeling and statistical inference within a complex architecture that decomposes the NL interpretation task into a cascade of three different stages: (1) The selection of key ontological information from the question (i.e. predicate, arguments and properties), (2) the location of such salient information in the ontology through the joint disambiguation of the different candidates and (3) the compilation of the final SPARQL query. This architecture characterizes a novel approach for the task and exploits a graphical model (i.e. an Hidden Markov Model) to select the proper ontological triples according to the graph nature of RDF. In particular, for each query an HMM model is produced whose Viterbi solution is the comprehensive joint disambiguation across the sentence elements. The combination of these approaches achieved interesting results in the QALD competition. The RTV is in fact within the group of participants performing slightly below the best system, but with smaller requirements and on significantly poorer input information.
Sequence-to-sequence neural networks are redesigning dialog managers for Conversational AI in ind... more Sequence-to-sequence neural networks are redesigning dialog managers for Conversational AI in industries. However, industrial applications impose two important constraints: training data are often scarce and the behavior of dialog managers should be strictly controlled and certified. In this paper, we propose the Conversational Logic Injected Neural Network (CLINN). This novel network merges dialog managers “programmed” using logical rules and a Sequenceto-Sequence Neural Network. We experimented with the Restaurant topic of the MultiWOZ dataset. Results show that injected rules are effective when training data set are scarce as well as when more data are available.
ArXiv, 2021
Incorporating explicit domain knowledge into neural-based task-oriented dialogue systems is an ef... more Incorporating explicit domain knowledge into neural-based task-oriented dialogue systems is an effective way to reduce the need of large sets of annotated dialogues. In this paper, we investigate how the use of explicit domain knowledge of conversational designers affects the performance of neural-based dialogue systems. To support this investigation, we propose the Conversational-LogicInjection-in-Neural-Network system (CLINN) where explicit knowledge is coded in semilogical rules. By using CLINN, we evaluated semi-logical rules produced by a team of differently-skilled conversational designers. We experimented with the Restaurant topic of the MultiWOZ dataset. Results show that external knowledge is extremely important for reducing the need of annotated examples for conversational systems. In fact, rules from conversational designers used in CLINN significantly outperform a state-of-the-art neural-based dialogue system.
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data - AND '09, 2009
In this paper, we present a system for the generation of cultural itineraries that exploits conve... more In this paper, we present a system for the generation of cultural itineraries that exploits conversational agents to implicitly build formal user profiles. The key idea is that the preferences for user profiling are not obtained in a direct way, but acquired during a natural language conversation of the tourists with the system. When the user profile is ready, it becomes the input for the generation of the customized cultural itinerary. The proposed system, called DiEM System 3 , is designed for dialogues in the domain of cultural heritage, but its flexible architecture allows to customize the dialogues in different application domains (cinema, finance, medicine, etc.).
Topic Models like Latent Dirichlet Allocation have been widely used for their robustness in estim... more Topic Models like Latent Dirichlet Allocation have been widely used for their robustness in estimating text models through mixtures of latent topics. Although LDA has been mostly used as a strictly lexicalized approach, it can be effectively applicable to a much richer set of linguistic structures. A novel application of LDA is here presented that acquires suitable grammatical generalizations for semantic tasks tightly dependent on NL syntax. We show how the resulting topics represent suitable generalizations over syntactic structures and lexical information as well. The evaluation on two different classification tasks, such as predicate recognition and question classification, shows that state of the art results are obtained.
Museums need to find innovative ways of communicating if these institutions want to survive in th... more Museums need to find innovative ways of communicating if these institutions want to survive in the new era and want to play their active role of educators. In this paper, we will present our idea of living artworks. Using conversational agents we want to give artworks the capability of talking to visitors. A living artwork attracts attention, being a funny and novel combination of art and technology. The mix of experience and action has a beneficial effect in learning new concepts or facts. We will then present our methodology for building living artworks, the enabling technologies, and a case study.
International Journal on Document Analysis and Recognition (IJDAR), 2010
In this paper, we present models for mining text relations between named entities, which can deal... more In this paper, we present models for mining text relations between named entities, which can deal with data highly affected by linguistic noise. Our models are made robust by: (a) the exploitation of state-of-the-art statistical algorithms such as support vector machines (SVMs) along with effective and versatile pattern mining methods, e.g. word sequence kernels; (b) the design of specific features capable of capturing long distance relationships; and (c) the use of domain prior knowledge in the form of ontological constraints, e.g. bounds on the type of relation arguments given by the semantic categories of the involved entities. This property allows for keeping small the training data required by SVMs and consequently lowering the system design costs. We empirically tested our hybrid model in the very complex domain of business intelligence, where the textual data are constituted by reports on investigations into criminal enterprises based on police interrogatory reports, electronic eavesdropping and wiretaps. The target relations are typically established between entities, as they are mentioned in these information sources. The experiments on mining such relations show that our approach with small training data C. Giannone (B) · P. Naggar CM Sistemi s.p.is robust to non-conventional languages as dialects, jargon expressions or coded words typically contained in such text.
Lecture Notes in Computer Science, 2007
This paper proposes a model for ontological representation supporting task-oriented dialog. The a... more This paper proposes a model for ontological representation supporting task-oriented dialog. The adoption of our ontology representation allows to map an interactive Question Answering (iQA) task into a knowledge based process. It supports dialog control, speech act recognition, planning and natural language generation through a unified knowledge model. A platform for developing iQA systems in specific domains, called REQUIRE (Robust Empirical QUestion answering for Intelligent Retrieval ), has been entirely developed over this model. The first prototype developed for medical consulting in the sexual health domain has been recently deployed and is currently under testing. This will serve as a basis for exemplifying the model and discussing its benefits.