Irene Castellon | Universitat de Barcelona (original) (raw)

Papers by Irene Castellon

Research paper thumbnail of On the concept of diathesis alternations as semantic oppositions

Proceedings of the ACL-SIGLEX Workshop …

Bookmarks Related papers MentionsView impact

Research paper thumbnail of OUP accepted manuscript

Digital Scholarship in the Humanities, 2019

Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual... more Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual segments, is an important and useful task in Natural Language Processing. In this article, we have analyzed the datasets provided by the Semantic Evaluation (SemEval) 2012–2014 campaigns for this task in order to find out appropriate linguistic features for each dataset, taking into account the influence that linguistic features at different levels (e.g. syntactic constituents and lexical semantics) might have on the sentence similarity. Results indicate that a linguistic feature may have a different effect on different corpus due to the great difference in sentence structure and vocabulary between datasets. Thus, we conclude that the selection of linguistic features according to the genre of the text might be a good strategy for obtaining better results in the STS task. This analysis could be a useful reference for measuring system building and linguistic feature tuning.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Spanish EuroWordNet and LCS-Based Interlingual MT

Proceedings of the MT Summit Workshop on Interlinguas in MT, San Diego, CA, Oct 1, 1997

We present a machine translation framework in which the interlingua| Lexical Conceptual Structure... more We present a machine translation framework in which the interlingua| Lexical Conceptual Structure (LCS)| is coupled with a de nitional component that includes bilingual (EuroWordNet) links between words in the source and target languages. While the links between individual words are language-speci c, the LCS is designed to be a language-independent, compositional representation. We take the view that the two types of information| shallower, transfer-like knowledge as well as deeper, compositional ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Towards Spanish Verbs' Selectional Preferences Automatic Acquisition. Semantic Annotation of the SenSem Corpus

Proceedings of The 6th …, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Knowledge intensive e-mail summarization in CARPANTA

Meeting of the Association for Computational Linguistics, Jul 1, 2004

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Inter-Annotator Agreement for the Factual Status of Predicates in the TAGFACT Corpus

Revista Signos, Mar 1, 2023

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Spanish WordNet 3.0

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Los verbos de transferencia

Bookmarks Related papers MentionsView impact

Research paper thumbnail of The TAGFACT annotator and editor: A versatile tool

Research in Corpus Linguistics, 2020

The multifunctional tool this paper presents has been developed within the TAGFACT project, a pro... more The multifunctional tool this paper presents has been developed within the TAGFACT project, a project that aims to automate the annotation of factuality –understood as the degree of commitment with which the writer presents situations– in Spanish journalistic texts. In what follows, the tool, which allows the compilation of the texts and the manual annotation of predicates, is described. The corpus created using it has been extracted in groups of three pieces of news covering the same event from newspapers with different ideologies (left wing, right wing and centrist). It is made up of 176 different pieces of news, containing 1,359 sentences and 46,947 words. The tool has been used so far to manually annotate a section of the ‘Gold Standard’ (approximately 10,000 words). It has proved to be versatile in that it allows for both the creation and management of corpora and corpus annotation, using any tags the user wants depending on the purpose of each corpus.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Guiding automatic MT evaluation by means of linguistic features

Digital Scholarship in the Humanities, 2016

Machine translation (MT) has become increasingly important and popular in the past decade, leadin... more Machine translation (MT) has become increasingly important and popular in the past decade, leading to the development of MT evaluation metrics aiming at automatically assessing MT output. Most of these metrics use reference translations to compare systems output, therefore, they should not only detect MT errors but also be able to identify correct equivalent expressions so as not to penalize them when those are not displayed in the reference translations. With the aim of improving MT evaluation metrics a study has been carried out of a wide panorama of linguistic features and their implications. For that purpose a Spanish and an English corpora containing hypothesis and reference translations have been analysed from a linguistic point of view, so that common errors can be detected and positive equivalencies highlighted. This article focuses on this qualitative analysis describing the linguistic phenomena that should be considered when developing an automatic MT evaluation metric. The results of this analysis have been used to develop an automatic MT evaluation metric that takes into account different dimensions of language. A brief review of the metric and its evaluation are also provided.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Diseño de una gramática para sistemas de diálogo hombre máquina

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Aplicación de reglas léxicas y reglas gramaticales en el proceso de análisis

Procesamiento de Lenguaje Natural, 1994

Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. I... more Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. Información de indexación. Información bibliográfica. Buscar referencias. Política de Revisión. Envía por correo este artículo (Se requiere entrar). Mandar correo-e a autor/a (Se requiere entrar). Contenido de la Revista Buscar. Todos. Navegar: ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of La interpretación semántica en un sistema dialogado

Procesamiento de Lenguaje Natural, 1989

Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. I... more Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. Información de indexación. Información bibliográfica. Buscar referencias. Política de Revisión. Envía por correo este artículo (Se requiere entrar). Mandar correo-e a autor/a (Se requiere entrar). Contenido de la Revista Buscar. Todos. Navegar: ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of An interlingua representation based on the lexico-semantic information

An interlingua representation based on the lexico-semantic information, 2005

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Proyecto TAGFACT: Del texto al conocimiento. Factualidad y grados de certeza en español

Proces. del Leng. Natural, 2018

El objetivo general de este proyecto es crear una herramienta para la anotacion de la factualidad... more El objetivo general de este proyecto es crear una herramienta para la anotacion de la factualidad expresada en textos en espanol a traves del procesamiento automatico. Pretendemos que dicha representacion sea muy rica, por lo que se llevara a cabo desde tres ejes distintos: multinivel, multidimensional y multitextual. El analisis multinivel da cuenta de las distintas marcas linguisticas que expresan el grado de certeza de un evento a nivel morfologico y sintactico, pero tambien discursivo; el analisis multidimensional, de un numero variado de las voces que evaluan dicho evento; y el analisis multitextual, de distintos textos sobre un mismo evento, siendo este ultimo uno de los aspectos mas innovadores de la propuesta.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Hacia una clasificación verbal automática para el español: estudio sobre la relevancia de los diferentes tipos y configuraciones de información sintáctico-semántica

Linguamática, 2015

En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el e... more En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el espanol. Para ello realizamos una serie de experimentos con 20 sentidos verbales del corpus Sensem. Empleamos diferentes tipos de atributos que abarcan informacion linguistica diversa y un metodo de clustering jerarquico aglomerativo para generar varias clasificaciones. Comparamos cada una de estas clasificaciones automaticas con un gold standard creado semi-automaticamente teniendo en cuenta construcciones linguisticas propuestas desde la linguistica teorica. Esta comparacion nos permite saber que atributos son mas adecuados para crear de forma automatica una clasificacion coherente con la teoria sobre construcciones y cuales son las similitudes y diferencias entre la clasificacion verbal automatica y la que se basa en la teoria sobre construcciones linguisticas.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Similitud verbal: Análisis comparativo entre lingüística teórica y datos extraídos de corpus

Revista signos, Dec 1, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Anotación semántica de los sustantivos del corpus SenSem

Abstract: El objetivo principal del proyecto es la anotación semántica de los sustantivos argumen... more Abstract: El objetivo principal del proyecto es la anotación semántica de los sustantivos argumentales del corpus SenSem con los sentidos de WordNet. El objetivo último de la investigación es la adquisición de preferencias semánticas.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Semantic Parsing with Verbal Subcategorization


Bookmarks Related papers MentionsView impact

Research paper thumbnail of Semantic annotation of Nouns in Sensem Corpus

The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with ... more The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with synsets of WordNet. The final objective of research is the acquisition of semantic preferences.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of On the concept of diathesis alternations as semantic oppositions

Proceedings of the ACL-SIGLEX Workshop …

Bookmarks Related papers MentionsView impact

Research paper thumbnail of OUP accepted manuscript

Digital Scholarship in the Humanities, 2019

Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual... more Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual segments, is an important and useful task in Natural Language Processing. In this article, we have analyzed the datasets provided by the Semantic Evaluation (SemEval) 2012–2014 campaigns for this task in order to find out appropriate linguistic features for each dataset, taking into account the influence that linguistic features at different levels (e.g. syntactic constituents and lexical semantics) might have on the sentence similarity. Results indicate that a linguistic feature may have a different effect on different corpus due to the great difference in sentence structure and vocabulary between datasets. Thus, we conclude that the selection of linguistic features according to the genre of the text might be a good strategy for obtaining better results in the STS task. This analysis could be a useful reference for measuring system building and linguistic feature tuning.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Spanish EuroWordNet and LCS-Based Interlingual MT

Proceedings of the MT Summit Workshop on Interlinguas in MT, San Diego, CA, Oct 1, 1997

We present a machine translation framework in which the interlingua| Lexical Conceptual Structure... more We present a machine translation framework in which the interlingua| Lexical Conceptual Structure (LCS)| is coupled with a de nitional component that includes bilingual (EuroWordNet) links between words in the source and target languages. While the links between individual words are language-speci c, the LCS is designed to be a language-independent, compositional representation. We take the view that the two types of information| shallower, transfer-like knowledge as well as deeper, compositional ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Towards Spanish Verbs' Selectional Preferences Automatic Acquisition. Semantic Annotation of the SenSem Corpus

Proceedings of The 6th …, 2008

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Knowledge intensive e-mail summarization in CARPANTA

Meeting of the Association for Computational Linguistics, Jul 1, 2004

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Inter-Annotator Agreement for the Factual Status of Predicates in the TAGFACT Corpus

Revista Signos, Mar 1, 2023

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Spanish WordNet 3.0

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Los verbos de transferencia

Bookmarks Related papers MentionsView impact

Research paper thumbnail of The TAGFACT annotator and editor: A versatile tool

Research in Corpus Linguistics, 2020

The multifunctional tool this paper presents has been developed within the TAGFACT project, a pro... more The multifunctional tool this paper presents has been developed within the TAGFACT project, a project that aims to automate the annotation of factuality –understood as the degree of commitment with which the writer presents situations– in Spanish journalistic texts. In what follows, the tool, which allows the compilation of the texts and the manual annotation of predicates, is described. The corpus created using it has been extracted in groups of three pieces of news covering the same event from newspapers with different ideologies (left wing, right wing and centrist). It is made up of 176 different pieces of news, containing 1,359 sentences and 46,947 words. The tool has been used so far to manually annotate a section of the ‘Gold Standard’ (approximately 10,000 words). It has proved to be versatile in that it allows for both the creation and management of corpora and corpus annotation, using any tags the user wants depending on the purpose of each corpus.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Guiding automatic MT evaluation by means of linguistic features

Digital Scholarship in the Humanities, 2016

Machine translation (MT) has become increasingly important and popular in the past decade, leadin... more Machine translation (MT) has become increasingly important and popular in the past decade, leading to the development of MT evaluation metrics aiming at automatically assessing MT output. Most of these metrics use reference translations to compare systems output, therefore, they should not only detect MT errors but also be able to identify correct equivalent expressions so as not to penalize them when those are not displayed in the reference translations. With the aim of improving MT evaluation metrics a study has been carried out of a wide panorama of linguistic features and their implications. For that purpose a Spanish and an English corpora containing hypothesis and reference translations have been analysed from a linguistic point of view, so that common errors can be detected and positive equivalencies highlighted. This article focuses on this qualitative analysis describing the linguistic phenomena that should be considered when developing an automatic MT evaluation metric. The results of this analysis have been used to develop an automatic MT evaluation metric that takes into account different dimensions of language. A brief review of the metric and its evaluation are also provided.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Diseño de una gramática para sistemas de diálogo hombre máquina

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Aplicación de reglas léxicas y reglas gramaticales en el proceso de análisis

Procesamiento de Lenguaje Natural, 1994

Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. I... more Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. Información de indexación. Información bibliográfica. Buscar referencias. Política de Revisión. Envía por correo este artículo (Se requiere entrar). Mandar correo-e a autor/a (Se requiere entrar). Contenido de la Revista Buscar. Todos. Navegar: ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of La interpretación semántica en un sistema dialogado

Procesamiento de Lenguaje Natural, 1989

Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. I... more Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. Información de indexación. Información bibliográfica. Buscar referencias. Política de Revisión. Envía por correo este artículo (Se requiere entrar). Mandar correo-e a autor/a (Se requiere entrar). Contenido de la Revista Buscar. Todos. Navegar: ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of An interlingua representation based on the lexico-semantic information

An interlingua representation based on the lexico-semantic information, 2005

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Proyecto TAGFACT: Del texto al conocimiento. Factualidad y grados de certeza en español

Proces. del Leng. Natural, 2018

El objetivo general de este proyecto es crear una herramienta para la anotacion de la factualidad... more El objetivo general de este proyecto es crear una herramienta para la anotacion de la factualidad expresada en textos en espanol a traves del procesamiento automatico. Pretendemos que dicha representacion sea muy rica, por lo que se llevara a cabo desde tres ejes distintos: multinivel, multidimensional y multitextual. El analisis multinivel da cuenta de las distintas marcas linguisticas que expresan el grado de certeza de un evento a nivel morfologico y sintactico, pero tambien discursivo; el analisis multidimensional, de un numero variado de las voces que evaluan dicho evento; y el analisis multitextual, de distintos textos sobre un mismo evento, siendo este ultimo uno de los aspectos mas innovadores de la propuesta.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Hacia una clasificación verbal automática para el español: estudio sobre la relevancia de los diferentes tipos y configuraciones de información sintáctico-semántica

Linguamática, 2015

En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el e... more En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el espanol. Para ello realizamos una serie de experimentos con 20 sentidos verbales del corpus Sensem. Empleamos diferentes tipos de atributos que abarcan informacion linguistica diversa y un metodo de clustering jerarquico aglomerativo para generar varias clasificaciones. Comparamos cada una de estas clasificaciones automaticas con un gold standard creado semi-automaticamente teniendo en cuenta construcciones linguisticas propuestas desde la linguistica teorica. Esta comparacion nos permite saber que atributos son mas adecuados para crear de forma automatica una clasificacion coherente con la teoria sobre construcciones y cuales son las similitudes y diferencias entre la clasificacion verbal automatica y la que se basa en la teoria sobre construcciones linguisticas.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Similitud verbal: Análisis comparativo entre lingüística teórica y datos extraídos de corpus

Revista signos, Dec 1, 2018

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Anotación semántica de los sustantivos del corpus SenSem

Abstract: El objetivo principal del proyecto es la anotación semántica de los sustantivos argumen... more Abstract: El objetivo principal del proyecto es la anotación semántica de los sustantivos argumentales del corpus SenSem con los sentidos de WordNet. El objetivo último de la investigación es la adquisición de preferencias semánticas.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Semantic Parsing with Verbal Subcategorization


Bookmarks Related papers MentionsView impact

Research paper thumbnail of Semantic annotation of Nouns in Sensem Corpus

The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with ... more The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with synsets of WordNet. The final objective of research is the acquisition of semantic preferences.

Bookmarks Related papers MentionsView impact