Carlos Gómez-Rodríguez | Universidade da Coruña (original) (raw)

Books by Carlos Gómez-Rodríguez

Research paper thumbnail of Parsing Schemata for Practical Text Analysis

The book presents a wide range of recent research results about parsing schemata, introducing for... more The book presents a wide range of recent research results about parsing schemata, introducing formal frameworks and theoretical results while keeping a constant focus on applicability to practical parsing problems. The first part includes a general introduction to the parsing schemata formalism that contains the basic notions needed to understand the rest of the parts. Thus, this compendium can be used as an introduction to natural language parsing, allowing postgraduate students not only to get a solid grasp of the fundamental concepts underlying parsing algorithms, but also an understanding of the latest developments and challenges in the field.

Researchers in computational linguistics will find novel results where parsing schemata are applied to current problems that are being actively researched in the computational linguistics community (like dependency parsing, robust parsing, or the treatment of non-projective linguistics phenomena). This book not only explains these results in a more detailed, comprehensive and self-contained way, and highlights the relations between them, but also includes new contributions that have not been presented.

Papers by Carlos Gómez-Rodríguez

Research paper thumbnail of EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis

Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk and Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 2016

Code-switching texts are those that contain terms in two or more different languages, and they ap... more Code-switching texts are those that contain terms in two or more different languages, and they appear increasingly often in social media. The aim of this paper is to provide a resource to the research community to evaluate the performance of sentiment classification techniques on this complex multilingual environment, proposing an English-Spanish corpus of tweets with code-switching (EN-ES-CS CORPUS). The tweets are labeled according to two well-known criteria used for this purpose: SentiStrength and a trinary scale (positive, neutral and negative categories). Preliminary work on the resource is already done, providing a set of baselines for the research community.

Research paper thumbnail of One model, two languages: training bilingual parsers with harmonized treebanks

ACL 2016. The 54th Annual Meeting of the Association for Computational Linguistics. Proceeedings of the Conference, Vol. 2 (Short Papers), 2016

We introduce an approach to train lexical-ized parsers using bilingual corpora obtained by mergin... more We introduce an approach to train lexical-ized parsers using bilingual corpora obtained by merging harmonized treebanks of different languages, producing parsers that can analyze sentences in either of the learned languages, or even sentences that mix both. We test the approach on the Universal Dependency Treebanks, training with MaltParser and MaltOpti-mizer. The results show that these bilingual parsers are more than competitive, as most combinations not only preserve accuracy , but some even achieve significant improvements over the corresponding mono-lingual parsers. Preliminary experiments also show the approach to be promising on texts with code-switching and when more languages are added.

Research paper thumbnail of LyS at TASS 2015: Deep Learning Experiments for Sentiment Analysis on Spanish Tweets

Resumen: Este artículo describe la participación del grupo LyS en el tass 2015. En la edición de ... more Resumen: Este artículo describe la participación del grupo LyS en el tass 2015. En la edición de este año, hemos utilizado una red neuronal denominada long short-term memory para abordar los dos retos propuestos: (1) análisis del sentimiento a nivel global y (2) análisis del sentimiento a nivel de aspectos sobre tuits futbolísticos y de política. El rendimiento obtenido por esta red de aprendizaje profundo es comparado con el de nuestro sistema del año pasado, una regresión logística con una regularización cuadrática. Los resultados experimentales muestran que es necesario incluir estrategias como pre-entrenamiento no supervisado, técnicas específicas para representar palabras como vectores o modificar la arquitectura actual para alcanzar resultados acordes con el estado del arte. Palabras clave: deep learning, long short-term memory, análisis del sentimiento, Twitter Abstract: This paper describes the participation of the LyS group at tass 2015. In this year's edition, we used a long short-term memory neural network to address the two proposed challenges: (1) sentiment analysis at a global level and (2) aspect-based sentiment analysis on football and political tweets. The performance of this deep learning approach is compared to our last-year model, based on a square-regularized logistic regression. Experimental results show that strategies such as unsupervised pre-training, sentiment-specific word embedding or modifying the current architecture might be needed to achieve state-of-the-art results.

Research paper thumbnail of Seguimiento y análisis automático de contenidos en redes sociales

La Minería de Opiniones es la disciplina que aborda el tratamiento automático de las opiniones co... more La Minería de Opiniones es la disciplina que aborda el tratamiento automático de las opiniones contenidas en un texto. Permite, por ejemplo, determinar si en un texto se está opinando o no, o si la polaridad o sentimiento que se expresa en el mismo es positiva, negativa o mixta. También permite la extracción automática de características, lo que posibilita conocer la percepción que los autores tienen sobre aspectos concretos de un tema determinado. Este trabajo, tras realizar una introducción a dichó ambito, presenta una aproximación propia al mismo, la cual destaca por emplear información sintáctica así como por estar especialmente adaptada a uno de los contextos de trabajo más complicados, Twitter. Dicha tecnología es fácilmente aplicable a tareas de inteligencia.

Research paper thumbnail of Sentiment Analysis on Monolingual, Multilingual and Code-Switching Twitter Corpora

6th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis. WASSA 2015. Workshop Proceedings, 2015

We address the problem of performing polarity classification on Twitter over different languages,... more We address the problem of performing polarity classification on Twitter over different languages, focusing on English and Spanish, comparing three techniques: (1) a monolingual model which knows the language in which the opinion is written, (2) a monolingual model that acts based on the decision provided by a language identification tool and (3) a multilingual model trained on a multilingual dataset that does not need any language recognition step. Results show that multilingual models are even able to outperform the monolingual models on some monolingual sets. We introduce the first code-switching corpus with sentiment labels, showing the robust-ness of a multilingual approach.

Research paper thumbnail of LyS at TASS 2015: Deep Learning Experiments for Sentiment Analysis on Spanish Tweets

Resumen: Este artículo describe la participación del grupo LyS en el tass 2015. En la edición de ... more Resumen: Este artículo describe la participación del grupo LyS en el tass 2015. En la edición de este año, hemos utilizado una red neuronal denominada long short-term memory para abordar los dos retos propuestos: (1) análisis del sentimiento a nivel global y (2) análisis del sentimiento a nivel de aspectos sobre tuits futbolísticos y de política. El rendimiento obtenido por esta red de aprendizaje profundo es comparado con el de nuestro sistema del año pasado, una regresión logística con una regularización cuadrática. Los resultados experimentales muestran que es necesario incluir estrategias como pre-entrenamiento no supervisado, técnicas específicas para representar palabras como vectores o modificar la arquitectura actual para alcanzar resultados acordes con el estado del arte. Palabras clave: deep learning, long short-term memory, análisis del sentimiento, Twitter Abstract: This paper describes the participation of the LyS group at tass 2015. In this year's edition, we used a long short-term memory neural network to address the two proposed challenges: (1) sentiment analysis at a global level and (2) aspect-based sentiment analysis on football and political tweets. The performance of this deep learning approach is compared to our last-year model, based on a square-regularized logistic regression. Experimental results show that strategies such as unsupervised pre-training, sentiment-specific word embedding or modifying the current architecture might be needed to achieve state-of-the-art results.

Research paper thumbnail of LyS at TASS 2014: A Prototype for Extracting and Analysing Aspects from Spanish tweets

Resumen: Este artículo describe nuestra participación en la tercera edición del taller de análisi... more Resumen: Este artículo describe nuestra participación en la tercera edición del taller de análisis del sentimiento de tuits escritos en castellano, el tass 2014. En la evaluación competitiva de este año, se han propuesto cuatro retos: (1) análisis del sentimiento a nivel global, (2) clasificación de tópicos, (3) extracción de as-pectos y (4) análisis del sentimiento a nivel aspectual. Para las tareas 1 y 2 em-pleamos una aproximación basada en aprendizaje automático, donde distintos re-cursos lingüísticos e información extraída del conjunto de entrenamiento son uti-lizados para entrenar un clasificador supervisado. Para abordar la tarea 3, nuestra aproximación recolecta una lista de representaciones que es empleada para identi-ficar los aspectos requeridos por los organizadores. Porúltimo, la tarea 4 delega en heurísticas para identificar el alcance de cada aspecto, para después determinar su sentimiento a través de un clasificador supervisado. Los resultados experimentales son prometedores y nos servirán para desarrollar técnicas más complejas en el fu-turo. Palabras clave: Análisis del sentimiento, Clasificación de tópicos, Extracción de aspectos, Análisis del sentimiento a nivel aspectual. Abstract: This paper describes our participation at the third edition of the workshop on Sentiment Analysis focused on Spanish tweets, tass 2014. This year's evaluation campaign includes four challenges: (1) global sentiment analysis, (2) topic classification, (3) aspect-extraction and (4) aspect-based sentiment analysis. Tasks 1 and 2 are addressed from a machine learning approach, using several linguistic resources and other information extracted from the training corpus to feed to a supervised classifier. With respect to task 3, we develop a naive approach, collecting a set of representations to identify the predefined aspects requested by the organisers. Finally, task 4 uses heuristics to identify the scope of each aspect, to then classify their sentiment via a supervised classifier. The experimental results are promising and will serve us as the starting point to develop more complex techniques.

Research paper thumbnail of A linguistic approach for determining the topics of Spanish Twitter messages

The vast amount of opinions and reviews provided in Twitter is helpful in order to make interesti... more The vast amount of opinions and reviews provided in Twitter is helpful in order to make interesting findings about a given industry, but given the huge number of messages published every day it is important to detect the relevant ones. In this respect, the Twitter search functionality is not a practical tool when we want to poll messages dealing with a given set of general topics. This article presents an approach to classify Twitter messages into various topics. We tackle the problem from a linguistic angle, taking into account part-of-speech, syntactic and semantic information, showing how language processing techniques should be adapted to deal with the informal language present in Twitter messages. The TASS 2013 General corpus, a collection of tweets which has been specifically annotated to perform text analytics tasks, is used as the dataset in our evaluation framework. We carry out a wide range of experiments to determine which kinds of linguistic information have the greatest impact on this task and how they should be combined in order to obtain the best-performing system. The results lead us to conclude that relating features by means of contextual information adds complementary knowledge over pure lexi

Research paper thumbnail of Divisible Transition Systems and Multiplanar Dependency Parsing

Transition-based parsing is a widely used approach for dependency parsing that combines high effi... more Transition-based parsing is a widely used approach for dependency parsing that combines high efficiency with expressive feature models. Many different transition systems have been proposed, often formalized in slightly different frameworks. In this article, we show that a large number of the known systems for projective dependency parsing can be viewed as variants of the same stack-based system with a small set of elementary transitions that can be composed into complex transitions and restricted in different ways.

Research paper thumbnail of Elimination of Spurious Ambiguity in Transition-Based Dependency Parsing

Abstract: We present a novel technique to remove spurious ambiguity from transition systems for d... more Abstract: We present a novel technique to remove spurious ambiguity from transition systems for dependency parsing. Our technique chooses a canonical sequence of transition operations (computation) for a given dependency tree. Our technique can be applied to a large class of bottom-up transition systems, including for instance Nivre (2004) and Attardi (2006).

Research paper thumbnail of La enseñanza del Procesamiento del Lenguaje Natural en facultades de Informática y Filología

Research paper thumbnail of Improving Transition-Based Dependency Parsing with Buffer Transitions

Research paper thumbnail of A deductive approach to dependency parsing

Abstract We define a new formalism, based on Sikkel's parsing schemata for constituency parsers, ... more Abstract We define a new formalism, based on Sikkel's parsing schemata for constituency parsers, that can be used to describe, analyze and compare dependency parsing algorithms. This abstraction allows us to establish clear relations between several existing projective dependency parsers and prove their correctness.

Research paper thumbnail of On theoretical and practical complexity of TAG parsers

Abstract We present a system allowing the automatic transformation of parsing schemata to efficie... more Abstract We present a system allowing the automatic transformation of parsing schemata to efficient executable implementations of their corresponding algorithms. This system can be used to easily prototype, test and compare different parsing algorithms. In this work, it has been used to generate several different parsers for Context Free Grammars and Tree Adjoining Grammars. By comparing their performance on different sized, artificially generated grammars, we can measure their empirical computational complexity.

Research paper thumbnail of Dependencias no dirigidas para el análisis basado en transiciones

En este artículo se presenta un nuevo enfoque para abordar el análisis de dependencias basado en ... more En este artículo se presenta un nuevo enfoque para abordar el análisis de dependencias basado en transiciones. Se propone que el analizador construya un grafo no dirigido durante el proceso de análisis, en lugar de la estructura de dependencias dirigida clásica. A posteriori, la estructura no dirigida es transformada en un árbol de dependencias. Con ello se consigue reducir la propagación de errores propia de estos sistemas.

Research paper thumbnail of Un análisis comparativo de estrategias para la categorización semántica de textos cortos

Resumen: La categorización de textos cortos es, hoy en dıa, un área importante de investigación d... more Resumen: La categorización de textos cortos es, hoy en dıa, un área importante de investigación debido a que gran parte de la información que recibimos y con la cual trabajamos habitualmente tiene esta caracterıstica (e-mails, mensajes de texto, resúmenes de noticias, entre otros). Distintos trabajos han reportado resultados interesantes en la categorización de textos incorporando información semántica a la representación de los documentos.

Research paper thumbnail of Técnicas deductivas para el análisis sintáctico con corrección de errores

Resumen: Se presentan los esquemas de análisis sintáctico con corrección de errores, que permiten... more Resumen: Se presentan los esquemas de análisis sintáctico con corrección de errores, que permiten definir algoritmos de análisis sintáctico con corrección de errores de una manera abstracta y declarativa. Este formalismo puede utilizarse para describir dichos algoritmos de manera simple y uniforme, y proporciona una base formal para demostrar su corrección y otras propiedades.

Research paper thumbnail of A compiler for parsing schemata

Abstract We present a compiler that can be used to automatically obtain efficient Java implementa... more Abstract We present a compiler that can be used to automatically obtain efficient Java implementations of parsing algorithms from formal specifications expressed as parsing schemata. The system performs an analysis of the inference rules in the input schemata in order to determine the best data structures and indexes to use, and to ensure that the generated implementations are efficient.

Research paper thumbnail of Prototyping Efficient Natural Language Parsers

Abstract We present a technique for the construction of efficient prototypes for natural language... more Abstract We present a technique for the construction of efficient prototypes for natural language parsing based on the compilation of parsing schemata to executable implementations of their corresponding algorithms. Taking a simple description of a schema as input, Java code for the corresponding parsing algorithm is generated, including schema-specific indexing code in order to attain efficiency. Key words: parsing schemata, context-free grammars, tree-adjoining grammars

Research paper thumbnail of Parsing Schemata for Practical Text Analysis

The book presents a wide range of recent research results about parsing schemata, introducing for... more The book presents a wide range of recent research results about parsing schemata, introducing formal frameworks and theoretical results while keeping a constant focus on applicability to practical parsing problems. The first part includes a general introduction to the parsing schemata formalism that contains the basic notions needed to understand the rest of the parts. Thus, this compendium can be used as an introduction to natural language parsing, allowing postgraduate students not only to get a solid grasp of the fundamental concepts underlying parsing algorithms, but also an understanding of the latest developments and challenges in the field.

Researchers in computational linguistics will find novel results where parsing schemata are applied to current problems that are being actively researched in the computational linguistics community (like dependency parsing, robust parsing, or the treatment of non-projective linguistics phenomena). This book not only explains these results in a more detailed, comprehensive and self-contained way, and highlights the relations between them, but also includes new contributions that have not been presented.

Research paper thumbnail of EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis

Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk and Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 2016

Code-switching texts are those that contain terms in two or more different languages, and they ap... more Code-switching texts are those that contain terms in two or more different languages, and they appear increasingly often in social media. The aim of this paper is to provide a resource to the research community to evaluate the performance of sentiment classification techniques on this complex multilingual environment, proposing an English-Spanish corpus of tweets with code-switching (EN-ES-CS CORPUS). The tweets are labeled according to two well-known criteria used for this purpose: SentiStrength and a trinary scale (positive, neutral and negative categories). Preliminary work on the resource is already done, providing a set of baselines for the research community.

Research paper thumbnail of One model, two languages: training bilingual parsers with harmonized treebanks

ACL 2016. The 54th Annual Meeting of the Association for Computational Linguistics. Proceeedings of the Conference, Vol. 2 (Short Papers), 2016

We introduce an approach to train lexical-ized parsers using bilingual corpora obtained by mergin... more We introduce an approach to train lexical-ized parsers using bilingual corpora obtained by merging harmonized treebanks of different languages, producing parsers that can analyze sentences in either of the learned languages, or even sentences that mix both. We test the approach on the Universal Dependency Treebanks, training with MaltParser and MaltOpti-mizer. The results show that these bilingual parsers are more than competitive, as most combinations not only preserve accuracy , but some even achieve significant improvements over the corresponding mono-lingual parsers. Preliminary experiments also show the approach to be promising on texts with code-switching and when more languages are added.

Research paper thumbnail of LyS at TASS 2015: Deep Learning Experiments for Sentiment Analysis on Spanish Tweets

Resumen: Este artículo describe la participación del grupo LyS en el tass 2015. En la edición de ... more Resumen: Este artículo describe la participación del grupo LyS en el tass 2015. En la edición de este año, hemos utilizado una red neuronal denominada long short-term memory para abordar los dos retos propuestos: (1) análisis del sentimiento a nivel global y (2) análisis del sentimiento a nivel de aspectos sobre tuits futbolísticos y de política. El rendimiento obtenido por esta red de aprendizaje profundo es comparado con el de nuestro sistema del año pasado, una regresión logística con una regularización cuadrática. Los resultados experimentales muestran que es necesario incluir estrategias como pre-entrenamiento no supervisado, técnicas específicas para representar palabras como vectores o modificar la arquitectura actual para alcanzar resultados acordes con el estado del arte. Palabras clave: deep learning, long short-term memory, análisis del sentimiento, Twitter Abstract: This paper describes the participation of the LyS group at tass 2015. In this year's edition, we used a long short-term memory neural network to address the two proposed challenges: (1) sentiment analysis at a global level and (2) aspect-based sentiment analysis on football and political tweets. The performance of this deep learning approach is compared to our last-year model, based on a square-regularized logistic regression. Experimental results show that strategies such as unsupervised pre-training, sentiment-specific word embedding or modifying the current architecture might be needed to achieve state-of-the-art results.

Research paper thumbnail of Seguimiento y análisis automático de contenidos en redes sociales

La Minería de Opiniones es la disciplina que aborda el tratamiento automático de las opiniones co... more La Minería de Opiniones es la disciplina que aborda el tratamiento automático de las opiniones contenidas en un texto. Permite, por ejemplo, determinar si en un texto se está opinando o no, o si la polaridad o sentimiento que se expresa en el mismo es positiva, negativa o mixta. También permite la extracción automática de características, lo que posibilita conocer la percepción que los autores tienen sobre aspectos concretos de un tema determinado. Este trabajo, tras realizar una introducción a dichó ambito, presenta una aproximación propia al mismo, la cual destaca por emplear información sintáctica así como por estar especialmente adaptada a uno de los contextos de trabajo más complicados, Twitter. Dicha tecnología es fácilmente aplicable a tareas de inteligencia.

Research paper thumbnail of Sentiment Analysis on Monolingual, Multilingual and Code-Switching Twitter Corpora

6th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis. WASSA 2015. Workshop Proceedings, 2015

We address the problem of performing polarity classification on Twitter over different languages,... more We address the problem of performing polarity classification on Twitter over different languages, focusing on English and Spanish, comparing three techniques: (1) a monolingual model which knows the language in which the opinion is written, (2) a monolingual model that acts based on the decision provided by a language identification tool and (3) a multilingual model trained on a multilingual dataset that does not need any language recognition step. Results show that multilingual models are even able to outperform the monolingual models on some monolingual sets. We introduce the first code-switching corpus with sentiment labels, showing the robust-ness of a multilingual approach.

Research paper thumbnail of LyS at TASS 2015: Deep Learning Experiments for Sentiment Analysis on Spanish Tweets

Resumen: Este artículo describe la participación del grupo LyS en el tass 2015. En la edición de ... more Resumen: Este artículo describe la participación del grupo LyS en el tass 2015. En la edición de este año, hemos utilizado una red neuronal denominada long short-term memory para abordar los dos retos propuestos: (1) análisis del sentimiento a nivel global y (2) análisis del sentimiento a nivel de aspectos sobre tuits futbolísticos y de política. El rendimiento obtenido por esta red de aprendizaje profundo es comparado con el de nuestro sistema del año pasado, una regresión logística con una regularización cuadrática. Los resultados experimentales muestran que es necesario incluir estrategias como pre-entrenamiento no supervisado, técnicas específicas para representar palabras como vectores o modificar la arquitectura actual para alcanzar resultados acordes con el estado del arte. Palabras clave: deep learning, long short-term memory, análisis del sentimiento, Twitter Abstract: This paper describes the participation of the LyS group at tass 2015. In this year's edition, we used a long short-term memory neural network to address the two proposed challenges: (1) sentiment analysis at a global level and (2) aspect-based sentiment analysis on football and political tweets. The performance of this deep learning approach is compared to our last-year model, based on a square-regularized logistic regression. Experimental results show that strategies such as unsupervised pre-training, sentiment-specific word embedding or modifying the current architecture might be needed to achieve state-of-the-art results.

Research paper thumbnail of LyS at TASS 2014: A Prototype for Extracting and Analysing Aspects from Spanish tweets

Resumen: Este artículo describe nuestra participación en la tercera edición del taller de análisi... more Resumen: Este artículo describe nuestra participación en la tercera edición del taller de análisis del sentimiento de tuits escritos en castellano, el tass 2014. En la evaluación competitiva de este año, se han propuesto cuatro retos: (1) análisis del sentimiento a nivel global, (2) clasificación de tópicos, (3) extracción de as-pectos y (4) análisis del sentimiento a nivel aspectual. Para las tareas 1 y 2 em-pleamos una aproximación basada en aprendizaje automático, donde distintos re-cursos lingüísticos e información extraída del conjunto de entrenamiento son uti-lizados para entrenar un clasificador supervisado. Para abordar la tarea 3, nuestra aproximación recolecta una lista de representaciones que es empleada para identi-ficar los aspectos requeridos por los organizadores. Porúltimo, la tarea 4 delega en heurísticas para identificar el alcance de cada aspecto, para después determinar su sentimiento a través de un clasificador supervisado. Los resultados experimentales son prometedores y nos servirán para desarrollar técnicas más complejas en el fu-turo. Palabras clave: Análisis del sentimiento, Clasificación de tópicos, Extracción de aspectos, Análisis del sentimiento a nivel aspectual. Abstract: This paper describes our participation at the third edition of the workshop on Sentiment Analysis focused on Spanish tweets, tass 2014. This year's evaluation campaign includes four challenges: (1) global sentiment analysis, (2) topic classification, (3) aspect-extraction and (4) aspect-based sentiment analysis. Tasks 1 and 2 are addressed from a machine learning approach, using several linguistic resources and other information extracted from the training corpus to feed to a supervised classifier. With respect to task 3, we develop a naive approach, collecting a set of representations to identify the predefined aspects requested by the organisers. Finally, task 4 uses heuristics to identify the scope of each aspect, to then classify their sentiment via a supervised classifier. The experimental results are promising and will serve us as the starting point to develop more complex techniques.

Research paper thumbnail of A linguistic approach for determining the topics of Spanish Twitter messages

The vast amount of opinions and reviews provided in Twitter is helpful in order to make interesti... more The vast amount of opinions and reviews provided in Twitter is helpful in order to make interesting findings about a given industry, but given the huge number of messages published every day it is important to detect the relevant ones. In this respect, the Twitter search functionality is not a practical tool when we want to poll messages dealing with a given set of general topics. This article presents an approach to classify Twitter messages into various topics. We tackle the problem from a linguistic angle, taking into account part-of-speech, syntactic and semantic information, showing how language processing techniques should be adapted to deal with the informal language present in Twitter messages. The TASS 2013 General corpus, a collection of tweets which has been specifically annotated to perform text analytics tasks, is used as the dataset in our evaluation framework. We carry out a wide range of experiments to determine which kinds of linguistic information have the greatest impact on this task and how they should be combined in order to obtain the best-performing system. The results lead us to conclude that relating features by means of contextual information adds complementary knowledge over pure lexi

Research paper thumbnail of Divisible Transition Systems and Multiplanar Dependency Parsing

Transition-based parsing is a widely used approach for dependency parsing that combines high effi... more Transition-based parsing is a widely used approach for dependency parsing that combines high efficiency with expressive feature models. Many different transition systems have been proposed, often formalized in slightly different frameworks. In this article, we show that a large number of the known systems for projective dependency parsing can be viewed as variants of the same stack-based system with a small set of elementary transitions that can be composed into complex transitions and restricted in different ways.

Research paper thumbnail of Elimination of Spurious Ambiguity in Transition-Based Dependency Parsing

Abstract: We present a novel technique to remove spurious ambiguity from transition systems for d... more Abstract: We present a novel technique to remove spurious ambiguity from transition systems for dependency parsing. Our technique chooses a canonical sequence of transition operations (computation) for a given dependency tree. Our technique can be applied to a large class of bottom-up transition systems, including for instance Nivre (2004) and Attardi (2006).

Research paper thumbnail of La enseñanza del Procesamiento del Lenguaje Natural en facultades de Informática y Filología

Research paper thumbnail of Improving Transition-Based Dependency Parsing with Buffer Transitions

Research paper thumbnail of A deductive approach to dependency parsing

Abstract We define a new formalism, based on Sikkel's parsing schemata for constituency parsers, ... more Abstract We define a new formalism, based on Sikkel's parsing schemata for constituency parsers, that can be used to describe, analyze and compare dependency parsing algorithms. This abstraction allows us to establish clear relations between several existing projective dependency parsers and prove their correctness.

Research paper thumbnail of On theoretical and practical complexity of TAG parsers

Abstract We present a system allowing the automatic transformation of parsing schemata to efficie... more Abstract We present a system allowing the automatic transformation of parsing schemata to efficient executable implementations of their corresponding algorithms. This system can be used to easily prototype, test and compare different parsing algorithms. In this work, it has been used to generate several different parsers for Context Free Grammars and Tree Adjoining Grammars. By comparing their performance on different sized, artificially generated grammars, we can measure their empirical computational complexity.

Research paper thumbnail of Dependencias no dirigidas para el análisis basado en transiciones

En este artículo se presenta un nuevo enfoque para abordar el análisis de dependencias basado en ... more En este artículo se presenta un nuevo enfoque para abordar el análisis de dependencias basado en transiciones. Se propone que el analizador construya un grafo no dirigido durante el proceso de análisis, en lugar de la estructura de dependencias dirigida clásica. A posteriori, la estructura no dirigida es transformada en un árbol de dependencias. Con ello se consigue reducir la propagación de errores propia de estos sistemas.

Research paper thumbnail of Un análisis comparativo de estrategias para la categorización semántica de textos cortos

Resumen: La categorización de textos cortos es, hoy en dıa, un área importante de investigación d... more Resumen: La categorización de textos cortos es, hoy en dıa, un área importante de investigación debido a que gran parte de la información que recibimos y con la cual trabajamos habitualmente tiene esta caracterıstica (e-mails, mensajes de texto, resúmenes de noticias, entre otros). Distintos trabajos han reportado resultados interesantes en la categorización de textos incorporando información semántica a la representación de los documentos.

Research paper thumbnail of Técnicas deductivas para el análisis sintáctico con corrección de errores

Resumen: Se presentan los esquemas de análisis sintáctico con corrección de errores, que permiten... more Resumen: Se presentan los esquemas de análisis sintáctico con corrección de errores, que permiten definir algoritmos de análisis sintáctico con corrección de errores de una manera abstracta y declarativa. Este formalismo puede utilizarse para describir dichos algoritmos de manera simple y uniforme, y proporciona una base formal para demostrar su corrección y otras propiedades.

Research paper thumbnail of A compiler for parsing schemata

Abstract We present a compiler that can be used to automatically obtain efficient Java implementa... more Abstract We present a compiler that can be used to automatically obtain efficient Java implementations of parsing algorithms from formal specifications expressed as parsing schemata. The system performs an analysis of the inference rules in the input schemata in order to determine the best data structures and indexes to use, and to ensure that the generated implementations are efficient.

Research paper thumbnail of Prototyping Efficient Natural Language Parsers

Abstract We present a technique for the construction of efficient prototypes for natural language... more Abstract We present a technique for the construction of efficient prototypes for natural language parsing based on the compilation of parsing schemata to executable implementations of their corresponding algorithms. Taking a simple description of a schema as input, Java code for the corresponding parsing algorithm is generated, including schema-specific indexing code in order to attain efficiency. Key words: parsing schemata, context-free grammars, tree-adjoining grammars

Research paper thumbnail of Estudio comparativo del rendimiento de analizadores sintácticos para gramáticas de adjunción de árboles

Resumen: En este trabajo se estudia el comportamiento de los algoritmos de análisis sintáctico má... more Resumen: En este trabajo se estudia el comportamiento de los algoritmos de análisis sintáctico más utilizados en el tratamiento de las Gramáticas de Adjunción de Árboles (TAG). Para ello se aplica una técnica de compilación que permite la transformación automática de esquemas de análisis sintáctico en implementaciones eficientes de los algoritmos que describen, lo que nos permite comparar el rendimiento de diferentes analizadores en un entorno homogéneo.