Lorenza Romano - Academia.edu (original) (raw)
Papers by Lorenza Romano
Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval '07, 2007
We present an approach for semantic relation extraction between nominals that combines shallow an... more We present an approach for semantic relation extraction between nominals that combines shallow and deep syntactic processing and semantic information using kernel methods. Two information sources are considered: (i) the whole sentence where the relation appears, and (ii) WordNet synsets and hypernymy relations of the candidate nominals. Each source of information is represented by kernel functions. In particular, five basic kernel functions are linearly combined and weighted under different conditions. The experiments were carried out using support vector machines as classifier. The system achieves an overall F 1 of 71.8% on the Classification of Semantic Relations between Nominals task at SemEval-2007.
This paper describes SIE (Simple Information Extraction), a modular information extraction system... more This paper describes SIE (Simple Information Extraction), a modular information extraction system designed with the goal of being easily and quickly portable across tasks and domains. SIE is composed by a general purpose machine learning algorithm (SVM) combined with several customizable modules. A crucial role in the architecture is played by Instance Filtering, which allows to increase efficiency without reducing effectiveness. The results obtained by SIE on several standard data sets, representative of different tasks and domains, are reported. The experiments show that SIE achieves performance close to the best systems in all tasks, without using domain-specific knowledge.
Most work on ontology learning from text relies on unsupervised methods for relation extraction i... more Most work on ontology learning from text relies on unsupervised methods for relation extraction inspired by Hearst's work, and attempts to extract relations identified in work in formal linguistics and ontology. In this paper we present work aiming at extracting from text the set of concept attributes actually associated to concepts according to psychological research, and using state-of-the art supervised relation extraction techniques.
This report describes SIE (Simple Information Extraction), an information extraction system desig... more This report describes SIE (Simple Information Extraction), an information extraction system designed and developed in the context of the IST-Dot.Kom project (http://www.dot-kom.org), sponsored by the European Commission as part of the Framework V (grant IST-2001-34038). SIE is based on SVMs and was designed with the goal of being easily and quickly portable across tasks and domains. The results obtained by SIE on a few standard datasets, representative of different tasks and domains, are reported.
In this paper we present an approach to person name disambiguation that clusters documents on the... more In this paper we present an approach to person name disambiguation that clusters documents on the basis of textual features using cosine similarity and a machinely learned meta similarity measure. The approach achieves an F-measure of B-Cubed Precision and Recall of 0.74 1 on the Clustering Subtask for WePS-2. Such task consists of clustering a set of documents that mention an ambiguous person name according to the actual entities referred to that name.
ACM Transactions on Speech and Language Processing, 2007
... For example, given the sentence Kennedy's assassin, Sirhan Bishara Sirhan, was immediat... more ... For example, given the sentence Kennedy's assassin, Sirhan Bishara Sirhan, was immediately arrested., it is required that Sirhan Bishara Sirhan and Kennedy be identified as named entities of type person and the kill relation in which the former is the first argument (agent) of ...
We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MU... more We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MUC conferences and in later independent efforts applying machine learning to IE. We point out a number of problematic issues that may hamper the comparison between results obtained by different researchers. Some of them are common to other NLP tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter settings. Issues specific to IE evaluation include: how leniently to assess inexact identification of filler boundaries, the possibility of multiple fillers for a slot, and how the counting is performed. We argue that, when specifying an information extraction task, a number of characteristics should be clearly defined. However, in the papers only a few of them are usually explicitly specified. Our aim is to elaborate a clear and detailed experimental methodology and propose it to the IE community. The goal is to reach a widespread agreement on such proposal so that future IE evaluations will adopt the proposed methodology, making comparisons between algorithms fair and reliable. In order to achieve this goal, we will develop and make available to the community a set of tools and resources that incorporate a standardized IE methodology.
We present an approach for semantic relation extraction between nominals that combines shallow an... more We present an approach for semantic relation extraction between nominals that combines shallow and deep syntactic processing and semantic information using kernel methods. Two information sources are considered: (i) the whole sentence where the relation appears, and (ii) WordNet synsets and hypernymy relations of the candidate nominals. Each source of information is represented by kernel functions. In particular, five basic kernel functions are linearly combined and weighted under different conditions. The experiments were carried out using support vector machines as classifier. The system achieves an overall F 1 of 71.8% on the Classification of Semantic Relations between Nominals task at SemEval-2007.
Language Resources and Evaluation, 2008
We survey the evaluation methodology adopted in information extraction (IE), as defined in a few ... more We survey the evaluation methodology adopted in information extraction (IE), as defined in a few different efforts applying machine learning (ML) to IE. We identify a number of critical issues that hamper comparison of the results obtained by different researchers. Some of these issues are common to other NLP-related tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter settings. Some issues are specific to IE: how leniently to assess inexact identification of filler boundaries, the possibility of multiple fillers for a slot, and how the counting is performed. We argue that, when specifying an IE task, these issues should be explicitly addressed, and a number of methodological characteristics should be clearly defined. To empirically verify the practical impact of the issues mentioned above, we perform a survey of the results of different algorithms when applied to a few standard datasets. The survey shows a serious lack of consensus on these issues, which makes it difficult to draw firm conclusions on a comparative evaluation of the algorithms. Our aim is to elaborate a clear and detailed experimental methodology and propose it to the IE community. Widespread agreement on this proposal should lead to future IE comparative evaluations that are fair and reliable. To demonstrate the way the methodology is to be applied we have organized and run a comparative evaluation of ML-based IE systems (the Pascal Challenge on ML-based IE) where the principles described in this article are put into practice. In this article we describe the proposed methodology and its motivations. The Pascal evaluation is then described and its results presented.
We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MU... more We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MUC conferences and in later independent efforts applying machine learning to IE. We point out a number of problematic issues that may hamper the comparison between results obtained by different researchers. Some of them are common to other NLP tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter settings. Issues specific to IE evaluation include: how leniently to assess inexact identification of filler boundaries, the possibility of multiple fillers for a slot, and how the counting is performed. We argue that, when specifying an information extraction task, a number of characteristics should be clearly defined. However, in the papers only a few of them are usually explicitly specified. Our aim is to elaborate a clear and detailed experimental methodology and propose it to the IE community. The goal is to reach a widespread agreement on such proposal so that future IE evaluations will adopt the proposed methodology, making comparisons between algorithms fair and reliable. In order to achieve this goal, we will develop and make available to the community a set of tools and resources that incorporate a standardized IE methodology.
We propose an approach for extracting relations between entities from biomedical literature based... more We propose an approach for extracting relations between entities from biomedical literature based solely on shallow linguistic information. We use a combination of kernel functions to integrate two different information sources: (i) the whole sentence where the relation appears, and (ii) the local contexts around the interacting entities. We performed experiments on extracting gene and protein interactions from two different data sets. The results show that our approach outperforms most of the previous methods based on syntactic and semantic information.
Abstract. The paper describes an approach to cross-media knowledge acquisition which combines tex... more Abstract. The paper describes an approach to cross-media knowledge acquisition which combines text and raw data. The approach has been applied in a real-world use case concerning wind tunnel reports within the EU-funded project X-Media. The goal is to ...
This report describes SIE (Simple Information Extraction), an information extraction system desig... more This report describes SIE (Simple Information Extraction), an information extraction system designed and developed in the context of the
This document reports on the annotation of Named Entities for the Italian Content Annotation Bank... more This document reports on the annotation of Named Entities for the Italian Content Annotation Bank (ICAB) being developed at ITC-irst in conjunction with CELCT. I-CAB is a corpus of Italian news annotated with semantic information at different levels. The first level is represented by Temporal Expressions, the second level is represented by different types of Entities (both Named and not-Named), and the third level is represented by Relations between Entities (eg the affiliation relation connecting a person to an organization).
Conference of the European Chapter of the Association for Computational Linguistics, 2006
Unsupervised paraphrase acquisition has been an active research field in recent years, but its ef... more Unsupervised paraphrase acquisition has been an active research field in recent years, but its effective coverage and performance have rarely been evaluated. We propose a generic paraphrase-based approach for Relation Extraction (RE), aiming at a dual goal: obtaining an applicative evaluation scheme for paraphrase acquisition and obtaining a generic and largely unsupervised configuration for RE. We analyze the potential of our approach and evaluate an implemented prototype of it using an RE dataset. Our findings reveal a high potential for unsupervised paraphrase acquisition. We also identify the need for novel robust models for matching paraphrases in texts, which should address syntactic complexity and variability.
We present an approach for semantic relation extraction between nominals that combines shallow an... more We present an approach for semantic relation extraction between nominals that combines shallow and deep syntactic processing and semantic information using kernel methods. Two information sources are considered: (i) the whole sentence where the relation appears, and (ii) WordNet synsets and hypernymy relations of the candidate nominals. Each source of information is represented by kernel functions. In particular, five basic kernel functions are linearly combined and weighted under different conditions. The experiments were carried out using support vector machines as classifier. The system achieves an overall F1 of 71.8% on the Classification of Semantic Relations between Nominals task at SemEval-2007. 1
We propose an approach for extracting relations between entities from biomedical literature based... more We propose an approach for extracting relations between entities from biomedical literature based solely on shallow linguistic information. We use a combination of kernel functions to integrate two different information sources: (i) the whole sentence where the relation appears, and (ii) the local contexts around the interacting entities. We performed experiments on extracting gene and protein interactions from two different data sets. The results show that our approach outperforms most of the previous methods based on syntactic and semantic information.
We present a brief overview of the main challenges in the extraction of semantic relations from E... more We present a brief overview of the main challenges in the extraction of semantic relations from English text, and discuss the shortcomings of previous data sets and shared tasks. This leads us to introduce a new task, which will be part of SemEval-2010: multi-way classification of mutually exclusive semantic relations between pairs of common nominals. The task is designed to compare different approaches to the problem and to provide a standard testbed for future research, which can benefit many applications in Natural Language Processing. 1
This paper describes SIE (Simple Information Extraction), a modular information extraction system... more This paper describes SIE (Simple Information Extraction), a modular information extraction system designed with the goal of being easily and quickly portable across tasks and domains. SIE is composed by a general purpose machine learning algorithm (SVM) combined with several customizable modules. A crucial role in the architecture is played by Instance Filtering, which allows to increase efficiency without reducing effectiveness. The results obtained by SIE on several standard data sets, representative of different tasks and domains, are reported. The experiments show that SIE achieves performance close to the best systems in all tasks, without using domain-specific knowledge. 1
Unsupervised paraphrase acquisition has been an active research field in recent years, but its ef... more Unsupervised paraphrase acquisition has been an active research field in recent years, but its effective coverage and performance have rarely been evaluated. We propose a generic paraphrase-based approach for Relation Extraction (RE), aiming at a dual goal: obtaining an applicative evaluation scheme for paraphrase acquisition and obtaining a generic and largely unsupervised configuration for RE. We analyze the potential of our approach and evaluate an implemented prototype of it using an RE dataset. Our findings reveal a high potential for unsupervised paraphrase acquisition. We also identify the need for novel robust models for matching paraphrases in texts, which should address syntactic complexity and variability.
Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval '07, 2007
We present an approach for semantic relation extraction between nominals that combines shallow an... more We present an approach for semantic relation extraction between nominals that combines shallow and deep syntactic processing and semantic information using kernel methods. Two information sources are considered: (i) the whole sentence where the relation appears, and (ii) WordNet synsets and hypernymy relations of the candidate nominals. Each source of information is represented by kernel functions. In particular, five basic kernel functions are linearly combined and weighted under different conditions. The experiments were carried out using support vector machines as classifier. The system achieves an overall F 1 of 71.8% on the Classification of Semantic Relations between Nominals task at SemEval-2007.
This paper describes SIE (Simple Information Extraction), a modular information extraction system... more This paper describes SIE (Simple Information Extraction), a modular information extraction system designed with the goal of being easily and quickly portable across tasks and domains. SIE is composed by a general purpose machine learning algorithm (SVM) combined with several customizable modules. A crucial role in the architecture is played by Instance Filtering, which allows to increase efficiency without reducing effectiveness. The results obtained by SIE on several standard data sets, representative of different tasks and domains, are reported. The experiments show that SIE achieves performance close to the best systems in all tasks, without using domain-specific knowledge.
Most work on ontology learning from text relies on unsupervised methods for relation extraction i... more Most work on ontology learning from text relies on unsupervised methods for relation extraction inspired by Hearst's work, and attempts to extract relations identified in work in formal linguistics and ontology. In this paper we present work aiming at extracting from text the set of concept attributes actually associated to concepts according to psychological research, and using state-of-the art supervised relation extraction techniques.
This report describes SIE (Simple Information Extraction), an information extraction system desig... more This report describes SIE (Simple Information Extraction), an information extraction system designed and developed in the context of the IST-Dot.Kom project (http://www.dot-kom.org), sponsored by the European Commission as part of the Framework V (grant IST-2001-34038). SIE is based on SVMs and was designed with the goal of being easily and quickly portable across tasks and domains. The results obtained by SIE on a few standard datasets, representative of different tasks and domains, are reported.
In this paper we present an approach to person name disambiguation that clusters documents on the... more In this paper we present an approach to person name disambiguation that clusters documents on the basis of textual features using cosine similarity and a machinely learned meta similarity measure. The approach achieves an F-measure of B-Cubed Precision and Recall of 0.74 1 on the Clustering Subtask for WePS-2. Such task consists of clustering a set of documents that mention an ambiguous person name according to the actual entities referred to that name.
ACM Transactions on Speech and Language Processing, 2007
... For example, given the sentence Kennedy's assassin, Sirhan Bishara Sirhan, was immediat... more ... For example, given the sentence Kennedy's assassin, Sirhan Bishara Sirhan, was immediately arrested., it is required that Sirhan Bishara Sirhan and Kennedy be identified as named entities of type person and the kill relation in which the former is the first argument (agent) of ...
We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MU... more We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MUC conferences and in later independent efforts applying machine learning to IE. We point out a number of problematic issues that may hamper the comparison between results obtained by different researchers. Some of them are common to other NLP tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter settings. Issues specific to IE evaluation include: how leniently to assess inexact identification of filler boundaries, the possibility of multiple fillers for a slot, and how the counting is performed. We argue that, when specifying an information extraction task, a number of characteristics should be clearly defined. However, in the papers only a few of them are usually explicitly specified. Our aim is to elaborate a clear and detailed experimental methodology and propose it to the IE community. The goal is to reach a widespread agreement on such proposal so that future IE evaluations will adopt the proposed methodology, making comparisons between algorithms fair and reliable. In order to achieve this goal, we will develop and make available to the community a set of tools and resources that incorporate a standardized IE methodology.
We present an approach for semantic relation extraction between nominals that combines shallow an... more We present an approach for semantic relation extraction between nominals that combines shallow and deep syntactic processing and semantic information using kernel methods. Two information sources are considered: (i) the whole sentence where the relation appears, and (ii) WordNet synsets and hypernymy relations of the candidate nominals. Each source of information is represented by kernel functions. In particular, five basic kernel functions are linearly combined and weighted under different conditions. The experiments were carried out using support vector machines as classifier. The system achieves an overall F 1 of 71.8% on the Classification of Semantic Relations between Nominals task at SemEval-2007.
Language Resources and Evaluation, 2008
We survey the evaluation methodology adopted in information extraction (IE), as defined in a few ... more We survey the evaluation methodology adopted in information extraction (IE), as defined in a few different efforts applying machine learning (ML) to IE. We identify a number of critical issues that hamper comparison of the results obtained by different researchers. Some of these issues are common to other NLP-related tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter settings. Some issues are specific to IE: how leniently to assess inexact identification of filler boundaries, the possibility of multiple fillers for a slot, and how the counting is performed. We argue that, when specifying an IE task, these issues should be explicitly addressed, and a number of methodological characteristics should be clearly defined. To empirically verify the practical impact of the issues mentioned above, we perform a survey of the results of different algorithms when applied to a few standard datasets. The survey shows a serious lack of consensus on these issues, which makes it difficult to draw firm conclusions on a comparative evaluation of the algorithms. Our aim is to elaborate a clear and detailed experimental methodology and propose it to the IE community. Widespread agreement on this proposal should lead to future IE comparative evaluations that are fair and reliable. To demonstrate the way the methodology is to be applied we have organized and run a comparative evaluation of ML-based IE systems (the Pascal Challenge on ML-based IE) where the principles described in this article are put into practice. In this article we describe the proposed methodology and its motivations. The Pascal evaluation is then described and its results presented.
We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MU... more We survey the evaluation methodology adopted in Information Extraction (IE), as defined in the MUC conferences and in later independent efforts applying machine learning to IE. We point out a number of problematic issues that may hamper the comparison between results obtained by different researchers. Some of them are common to other NLP tasks: e.g., the difficulty of exactly identifying the effects on performance of the data (sample selection and sample size), of the domain theory (features selected), and of algorithm parameter settings. Issues specific to IE evaluation include: how leniently to assess inexact identification of filler boundaries, the possibility of multiple fillers for a slot, and how the counting is performed. We argue that, when specifying an information extraction task, a number of characteristics should be clearly defined. However, in the papers only a few of them are usually explicitly specified. Our aim is to elaborate a clear and detailed experimental methodology and propose it to the IE community. The goal is to reach a widespread agreement on such proposal so that future IE evaluations will adopt the proposed methodology, making comparisons between algorithms fair and reliable. In order to achieve this goal, we will develop and make available to the community a set of tools and resources that incorporate a standardized IE methodology.
We propose an approach for extracting relations between entities from biomedical literature based... more We propose an approach for extracting relations between entities from biomedical literature based solely on shallow linguistic information. We use a combination of kernel functions to integrate two different information sources: (i) the whole sentence where the relation appears, and (ii) the local contexts around the interacting entities. We performed experiments on extracting gene and protein interactions from two different data sets. The results show that our approach outperforms most of the previous methods based on syntactic and semantic information.
Abstract. The paper describes an approach to cross-media knowledge acquisition which combines tex... more Abstract. The paper describes an approach to cross-media knowledge acquisition which combines text and raw data. The approach has been applied in a real-world use case concerning wind tunnel reports within the EU-funded project X-Media. The goal is to ...
This report describes SIE (Simple Information Extraction), an information extraction system desig... more This report describes SIE (Simple Information Extraction), an information extraction system designed and developed in the context of the
This document reports on the annotation of Named Entities for the Italian Content Annotation Bank... more This document reports on the annotation of Named Entities for the Italian Content Annotation Bank (ICAB) being developed at ITC-irst in conjunction with CELCT. I-CAB is a corpus of Italian news annotated with semantic information at different levels. The first level is represented by Temporal Expressions, the second level is represented by different types of Entities (both Named and not-Named), and the third level is represented by Relations between Entities (eg the affiliation relation connecting a person to an organization).
Conference of the European Chapter of the Association for Computational Linguistics, 2006
Unsupervised paraphrase acquisition has been an active research field in recent years, but its ef... more Unsupervised paraphrase acquisition has been an active research field in recent years, but its effective coverage and performance have rarely been evaluated. We propose a generic paraphrase-based approach for Relation Extraction (RE), aiming at a dual goal: obtaining an applicative evaluation scheme for paraphrase acquisition and obtaining a generic and largely unsupervised configuration for RE. We analyze the potential of our approach and evaluate an implemented prototype of it using an RE dataset. Our findings reveal a high potential for unsupervised paraphrase acquisition. We also identify the need for novel robust models for matching paraphrases in texts, which should address syntactic complexity and variability.
We present an approach for semantic relation extraction between nominals that combines shallow an... more We present an approach for semantic relation extraction between nominals that combines shallow and deep syntactic processing and semantic information using kernel methods. Two information sources are considered: (i) the whole sentence where the relation appears, and (ii) WordNet synsets and hypernymy relations of the candidate nominals. Each source of information is represented by kernel functions. In particular, five basic kernel functions are linearly combined and weighted under different conditions. The experiments were carried out using support vector machines as classifier. The system achieves an overall F1 of 71.8% on the Classification of Semantic Relations between Nominals task at SemEval-2007. 1
We propose an approach for extracting relations between entities from biomedical literature based... more We propose an approach for extracting relations between entities from biomedical literature based solely on shallow linguistic information. We use a combination of kernel functions to integrate two different information sources: (i) the whole sentence where the relation appears, and (ii) the local contexts around the interacting entities. We performed experiments on extracting gene and protein interactions from two different data sets. The results show that our approach outperforms most of the previous methods based on syntactic and semantic information.
We present a brief overview of the main challenges in the extraction of semantic relations from E... more We present a brief overview of the main challenges in the extraction of semantic relations from English text, and discuss the shortcomings of previous data sets and shared tasks. This leads us to introduce a new task, which will be part of SemEval-2010: multi-way classification of mutually exclusive semantic relations between pairs of common nominals. The task is designed to compare different approaches to the problem and to provide a standard testbed for future research, which can benefit many applications in Natural Language Processing. 1
This paper describes SIE (Simple Information Extraction), a modular information extraction system... more This paper describes SIE (Simple Information Extraction), a modular information extraction system designed with the goal of being easily and quickly portable across tasks and domains. SIE is composed by a general purpose machine learning algorithm (SVM) combined with several customizable modules. A crucial role in the architecture is played by Instance Filtering, which allows to increase efficiency without reducing effectiveness. The results obtained by SIE on several standard data sets, representative of different tasks and domains, are reported. The experiments show that SIE achieves performance close to the best systems in all tasks, without using domain-specific knowledge. 1
Unsupervised paraphrase acquisition has been an active research field in recent years, but its ef... more Unsupervised paraphrase acquisition has been an active research field in recent years, but its effective coverage and performance have rarely been evaluated. We propose a generic paraphrase-based approach for Relation Extraction (RE), aiming at a dual goal: obtaining an applicative evaluation scheme for paraphrase acquisition and obtaining a generic and largely unsupervised configuration for RE. We analyze the potential of our approach and evaluate an implemented prototype of it using an RE dataset. Our findings reveal a high potential for unsupervised paraphrase acquisition. We also identify the need for novel robust models for matching paraphrases in texts, which should address syntactic complexity and variability.