Irene Castellon | Universitat de Barcelona (original) (raw)
Papers by Irene Castellon
Proceedings of the ACL-SIGLEX Workshop …
Bookmarks Related papers MentionsView impact
Digital Scholarship in the Humanities, 2019
Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual... more Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual segments, is an important and useful task in Natural Language Processing. In this article, we have analyzed the datasets provided by the Semantic Evaluation (SemEval) 2012–2014 campaigns for this task in order to find out appropriate linguistic features for each dataset, taking into account the influence that linguistic features at different levels (e.g. syntactic constituents and lexical semantics) might have on the sentence similarity. Results indicate that a linguistic feature may have a different effect on different corpus due to the great difference in sentence structure and vocabulary between datasets. Thus, we conclude that the selection of linguistic features according to the genre of the text might be a good strategy for obtaining better results in the STS task. This analysis could be a useful reference for measuring system building and linguistic feature tuning.
Bookmarks Related papers MentionsView impact
Proceedings of the MT Summit Workshop on Interlinguas in MT, San Diego, CA, Oct 1, 1997
We present a machine translation framework in which the interlingua| Lexical Conceptual Structure... more We present a machine translation framework in which the interlingua| Lexical Conceptual Structure (LCS)| is coupled with a de nitional component that includes bilingual (EuroWordNet) links between words in the source and target languages. While the links between individual words are language-speci c, the LCS is designed to be a language-independent, compositional representation. We take the view that the two types of information| shallower, transfer-like knowledge as well as deeper, compositional ...
Bookmarks Related papers MentionsView impact
Proceedings of The 6th …, 2008
Bookmarks Related papers MentionsView impact
Meeting of the Association for Computational Linguistics, Jul 1, 2004
Bookmarks Related papers MentionsView impact
Revista Signos, Mar 1, 2023
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Research in Corpus Linguistics, 2020
The multifunctional tool this paper presents has been developed within the TAGFACT project, a pro... more The multifunctional tool this paper presents has been developed within the TAGFACT project, a project that aims to automate the annotation of factuality –understood as the degree of commitment with which the writer presents situations– in Spanish journalistic texts. In what follows, the tool, which allows the compilation of the texts and the manual annotation of predicates, is described. The corpus created using it has been extracted in groups of three pieces of news covering the same event from newspapers with different ideologies (left wing, right wing and centrist). It is made up of 176 different pieces of news, containing 1,359 sentences and 46,947 words. The tool has been used so far to manually annotate a section of the ‘Gold Standard’ (approximately 10,000 words). It has proved to be versatile in that it allows for both the creation and management of corpora and corpus annotation, using any tags the user wants depending on the purpose of each corpus.
Bookmarks Related papers MentionsView impact
Digital Scholarship in the Humanities, 2016
Machine translation (MT) has become increasingly important and popular in the past decade, leadin... more Machine translation (MT) has become increasingly important and popular in the past decade, leading to the development of MT evaluation metrics aiming at automatically assessing MT output. Most of these metrics use reference translations to compare systems output, therefore, they should not only detect MT errors but also be able to identify correct equivalent expressions so as not to penalize them when those are not displayed in the reference translations. With the aim of improving MT evaluation metrics a study has been carried out of a wide panorama of linguistic features and their implications. For that purpose a Spanish and an English corpora containing hypothesis and reference translations have been analysed from a linguistic point of view, so that common errors can be detected and positive equivalencies highlighted. This article focuses on this qualitative analysis describing the linguistic phenomena that should be considered when developing an automatic MT evaluation metric. The results of this analysis have been used to develop an automatic MT evaluation metric that takes into account different dimensions of language. A brief review of the metric and its evaluation are also provided.
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Procesamiento de Lenguaje Natural, 1994
Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. I... more Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. Información de indexación. Información bibliográfica. Buscar referencias. Política de Revisión. Envía por correo este artículo (Se requiere entrar). Mandar correo-e a autor/a (Se requiere entrar). Contenido de la Revista Buscar. Todos. Navegar: ...
Bookmarks Related papers MentionsView impact
Procesamiento de Lenguaje Natural, 1989
Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. I... more Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. Información de indexación. Información bibliográfica. Buscar referencias. Política de Revisión. Envía por correo este artículo (Se requiere entrar). Mandar correo-e a autor/a (Se requiere entrar). Contenido de la Revista Buscar. Todos. Navegar: ...
Bookmarks Related papers MentionsView impact
An interlingua representation based on the lexico-semantic information, 2005
Bookmarks Related papers MentionsView impact
Proces. del Leng. Natural, 2018
El objetivo general de este proyecto es crear una herramienta para la anotacion de la factualidad... more El objetivo general de este proyecto es crear una herramienta para la anotacion de la factualidad expresada en textos en espanol a traves del procesamiento automatico. Pretendemos que dicha representacion sea muy rica, por lo que se llevara a cabo desde tres ejes distintos: multinivel, multidimensional y multitextual. El analisis multinivel da cuenta de las distintas marcas linguisticas que expresan el grado de certeza de un evento a nivel morfologico y sintactico, pero tambien discursivo; el analisis multidimensional, de un numero variado de las voces que evaluan dicho evento; y el analisis multitextual, de distintos textos sobre un mismo evento, siendo este ultimo uno de los aspectos mas innovadores de la propuesta.
Bookmarks Related papers MentionsView impact
Linguamática, 2015
En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el e... more En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el espanol. Para ello realizamos una serie de experimentos con 20 sentidos verbales del corpus Sensem. Empleamos diferentes tipos de atributos que abarcan informacion linguistica diversa y un metodo de clustering jerarquico aglomerativo para generar varias clasificaciones. Comparamos cada una de estas clasificaciones automaticas con un gold standard creado semi-automaticamente teniendo en cuenta construcciones linguisticas propuestas desde la linguistica teorica. Esta comparacion nos permite saber que atributos son mas adecuados para crear de forma automatica una clasificacion coherente con la teoria sobre construcciones y cuales son las similitudes y diferencias entre la clasificacion verbal automatica y la que se basa en la teoria sobre construcciones linguisticas.
Bookmarks Related papers MentionsView impact
Revista signos, Dec 1, 2018
Bookmarks Related papers MentionsView impact
Abstract: El objetivo principal del proyecto es la anotación semántica de los sustantivos argumen... more Abstract: El objetivo principal del proyecto es la anotación semántica de los sustantivos argumentales del corpus SenSem con los sentidos de WordNet. El objetivo último de la investigación es la adquisición de preferencias semánticas.
Bookmarks Related papers MentionsView impact
ABSTRACT
Bookmarks Related papers MentionsView impact
The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with ... more The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with synsets of WordNet. The final objective of research is the acquisition of semantic preferences.
Bookmarks Related papers MentionsView impact
Proceedings of the ACL-SIGLEX Workshop …
Bookmarks Related papers MentionsView impact
Digital Scholarship in the Humanities, 2019
Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual... more Semantic Textual Similarity (STS), which measures the equivalence of meanings between two textual segments, is an important and useful task in Natural Language Processing. In this article, we have analyzed the datasets provided by the Semantic Evaluation (SemEval) 2012–2014 campaigns for this task in order to find out appropriate linguistic features for each dataset, taking into account the influence that linguistic features at different levels (e.g. syntactic constituents and lexical semantics) might have on the sentence similarity. Results indicate that a linguistic feature may have a different effect on different corpus due to the great difference in sentence structure and vocabulary between datasets. Thus, we conclude that the selection of linguistic features according to the genre of the text might be a good strategy for obtaining better results in the STS task. This analysis could be a useful reference for measuring system building and linguistic feature tuning.
Bookmarks Related papers MentionsView impact
Proceedings of the MT Summit Workshop on Interlinguas in MT, San Diego, CA, Oct 1, 1997
We present a machine translation framework in which the interlingua| Lexical Conceptual Structure... more We present a machine translation framework in which the interlingua| Lexical Conceptual Structure (LCS)| is coupled with a de nitional component that includes bilingual (EuroWordNet) links between words in the source and target languages. While the links between individual words are language-speci c, the LCS is designed to be a language-independent, compositional representation. We take the view that the two types of information| shallower, transfer-like knowledge as well as deeper, compositional ...
Bookmarks Related papers MentionsView impact
Proceedings of The 6th …, 2008
Bookmarks Related papers MentionsView impact
Meeting of the Association for Computational Linguistics, Jul 1, 2004
Bookmarks Related papers MentionsView impact
Revista Signos, Mar 1, 2023
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Research in Corpus Linguistics, 2020
The multifunctional tool this paper presents has been developed within the TAGFACT project, a pro... more The multifunctional tool this paper presents has been developed within the TAGFACT project, a project that aims to automate the annotation of factuality –understood as the degree of commitment with which the writer presents situations– in Spanish journalistic texts. In what follows, the tool, which allows the compilation of the texts and the manual annotation of predicates, is described. The corpus created using it has been extracted in groups of three pieces of news covering the same event from newspapers with different ideologies (left wing, right wing and centrist). It is made up of 176 different pieces of news, containing 1,359 sentences and 46,947 words. The tool has been used so far to manually annotate a section of the ‘Gold Standard’ (approximately 10,000 words). It has proved to be versatile in that it allows for both the creation and management of corpora and corpus annotation, using any tags the user wants depending on the purpose of each corpus.
Bookmarks Related papers MentionsView impact
Digital Scholarship in the Humanities, 2016
Machine translation (MT) has become increasingly important and popular in the past decade, leadin... more Machine translation (MT) has become increasingly important and popular in the past decade, leading to the development of MT evaluation metrics aiming at automatically assessing MT output. Most of these metrics use reference translations to compare systems output, therefore, they should not only detect MT errors but also be able to identify correct equivalent expressions so as not to penalize them when those are not displayed in the reference translations. With the aim of improving MT evaluation metrics a study has been carried out of a wide panorama of linguistic features and their implications. For that purpose a Spanish and an English corpora containing hypothesis and reference translations have been analysed from a linguistic point of view, so that common errors can be detected and positive equivalencies highlighted. This article focuses on this qualitative analysis describing the linguistic phenomena that should be considered when developing an automatic MT evaluation metric. The results of this analysis have been used to develop an automatic MT evaluation metric that takes into account different dimensions of language. A brief review of the metric and its evaluation are also provided.
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Procesamiento de Lenguaje Natural, 1994
Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. I... more Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. Información de indexación. Información bibliográfica. Buscar referencias. Política de Revisión. Envía por correo este artículo (Se requiere entrar). Mandar correo-e a autor/a (Se requiere entrar). Contenido de la Revista Buscar. Todos. Navegar: ...
Bookmarks Related papers MentionsView impact
Procesamiento de Lenguaje Natural, 1989
Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. I... more Revista de Procesamiento de Lenguaje Natural. Herramientas del artículo Imprimir este artículo. Información de indexación. Información bibliográfica. Buscar referencias. Política de Revisión. Envía por correo este artículo (Se requiere entrar). Mandar correo-e a autor/a (Se requiere entrar). Contenido de la Revista Buscar. Todos. Navegar: ...
Bookmarks Related papers MentionsView impact
An interlingua representation based on the lexico-semantic information, 2005
Bookmarks Related papers MentionsView impact
Proces. del Leng. Natural, 2018
El objetivo general de este proyecto es crear una herramienta para la anotacion de la factualidad... more El objetivo general de este proyecto es crear una herramienta para la anotacion de la factualidad expresada en textos en espanol a traves del procesamiento automatico. Pretendemos que dicha representacion sea muy rica, por lo que se llevara a cabo desde tres ejes distintos: multinivel, multidimensional y multitextual. El analisis multinivel da cuenta de las distintas marcas linguisticas que expresan el grado de certeza de un evento a nivel morfologico y sintactico, pero tambien discursivo; el analisis multidimensional, de un numero variado de las voces que evaluan dicho evento; y el analisis multitextual, de distintos textos sobre un mismo evento, siendo este ultimo uno de los aspectos mas innovadores de la propuesta.
Bookmarks Related papers MentionsView impact
Linguamática, 2015
En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el e... more En este trabajo nos centramos en la adquisicion de clasificaciones verbales automaticas para el espanol. Para ello realizamos una serie de experimentos con 20 sentidos verbales del corpus Sensem. Empleamos diferentes tipos de atributos que abarcan informacion linguistica diversa y un metodo de clustering jerarquico aglomerativo para generar varias clasificaciones. Comparamos cada una de estas clasificaciones automaticas con un gold standard creado semi-automaticamente teniendo en cuenta construcciones linguisticas propuestas desde la linguistica teorica. Esta comparacion nos permite saber que atributos son mas adecuados para crear de forma automatica una clasificacion coherente con la teoria sobre construcciones y cuales son las similitudes y diferencias entre la clasificacion verbal automatica y la que se basa en la teoria sobre construcciones linguisticas.
Bookmarks Related papers MentionsView impact
Revista signos, Dec 1, 2018
Bookmarks Related papers MentionsView impact
Abstract: El objetivo principal del proyecto es la anotación semántica de los sustantivos argumen... more Abstract: El objetivo principal del proyecto es la anotación semántica de los sustantivos argumentales del corpus SenSem con los sentidos de WordNet. El objetivo último de la investigación es la adquisición de preferencias semánticas.
Bookmarks Related papers MentionsView impact
ABSTRACT
Bookmarks Related papers MentionsView impact
The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with ... more The main goal of this project is the semantic annotation of argument nouns of SenSem corpus with synsets of WordNet. The final objective of research is the acquisition of semantic preferences.
Bookmarks Related papers MentionsView impact