Carlos Gómez-Rodríguez | Universidade da Coruña (original) (raw)

Books by Carlos Gómez-Rodríguez

The book presents a wide range of recent research results about parsing schemata, introducing for... more The book presents a wide range of recent research results about parsing schemata, introducing formal frameworks and theoretical results while keeping a constant focus on applicability to practical parsing problems. The first part includes a general introduction to the parsing schemata formalism that contains the basic notions needed to understand the rest of the parts. Thus, this compendium can be used as an introduction to natural language parsing, allowing postgraduate students not only to get a solid grasp of the fundamental concepts underlying parsing algorithms, but also an understanding of the latest developments and challenges in the field.

Researchers in computational linguistics will find novel results where parsing schemata are applied to current problems that are being actively researched in the computational linguistics community (like dependency parsing, robust parsing, or the treatment of non-projective linguistics phenomena). This book not only explains these results in a more detailed, comprehensive and self-contained way, and highlights the relations between them, but also includes new contributions that have not been presented.

Papers by Carlos Gómez-Rodríguez

Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk and Stelios Piperidis (eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 2016

Code-switching texts are those that contain terms in two or more different languages, and they ap... more Code-switching texts are those that contain terms in two or more different languages, and they appear increasingly often in social media. The aim of this paper is to provide a resource to the research community to evaluate the performance of sentiment classification techniques on this complex multilingual environment, proposing an English-Spanish corpus of tweets with code-switching (EN-ES-CS CORPUS). The tweets are labeled according to two well-known criteria used for this purpose: SentiStrength and a trinary scale (positive, neutral and negative categories). Preliminary work on the resource is already done, providing a set of baselines for the research community.

ACL 2016. The 54th Annual Meeting of the Association for Computational Linguistics. Proceeedings of the Conference, Vol. 2 (Short Papers), 2016

We introduce an approach to train lexical-ized parsers using bilingual corpora obtained by mergin... more We introduce an approach to train lexical-ized parsers using bilingual corpora obtained by merging harmonized treebanks of different languages, producing parsers that can analyze sentences in either of the learned languages, or even sentences that mix both. We test the approach on the Universal Dependency Treebanks, training with MaltParser and MaltOpti-mizer. The results show that these bilingual parsers are more than competitive, as most combinations not only preserve accuracy , but some even achieve significant improvements over the corresponding mono-lingual parsers. Preliminary experiments also show the approach to be promising on texts with code-switching and when more languages are added.

Resumen: Este artículo describe la participación del grupo LyS en el tass 2015. En la edición de ... more Resumen: Este artículo describe la participación del grupo LyS en el tass 2015. En la edición de este año, hemos utilizado una red neuronal denominada long short-term memory para abordar los dos retos propuestos: (1) análisis del sentimiento a nivel global y (2) análisis del sentimiento a nivel de aspectos sobre tuits futbolísticos y de política. El rendimiento obtenido por esta red de aprendizaje profundo es comparado con el de nuestro sistema del año pasado, una regresión logística con una regularización cuadrática. Los resultados experimentales muestran que es necesario incluir estrategias como pre-entrenamiento no supervisado, técnicas específicas para representar palabras como vectores o modificar la arquitectura actual para alcanzar resultados acordes con el estado del arte. Palabras clave: deep learning, long short-term memory, análisis del sentimiento, Twitter Abstract: This paper describes the participation of the LyS group at tass 2015. In this year's edition, we used a long short-term memory neural network to address the two proposed challenges: (1) sentiment analysis at a global level and (2) aspect-based sentiment analysis on football and political tweets. The performance of this deep learning approach is compared to our last-year model, based on a square-regularized logistic regression. Experimental results show that strategies such as unsupervised pre-training, sentiment-specific word embedding or modifying the current architecture might be needed to achieve state-of-the-art results.

La Minería de Opiniones es la disciplina que aborda el tratamiento automático de las opiniones co... more La Minería de Opiniones es la disciplina que aborda el tratamiento automático de las opiniones contenidas en un texto. Permite, por ejemplo, determinar si en un texto se está opinando o no, o si la polaridad o sentimiento que se expresa en el mismo es positiva, negativa o mixta. También permite la extracción automática de características, lo que posibilita conocer la percepción que los autores tienen sobre aspectos concretos de un tema determinado. Este trabajo, tras realizar una introducción a dichó ambito, presenta una aproximación propia al mismo, la cual destaca por emplear información sintáctica así como por estar especialmente adaptada a uno de los contextos de trabajo más complicados, Twitter. Dicha tecnología es fácilmente aplicable a tareas de inteligencia.

6th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis. WASSA 2015. Workshop Proceedings, 2015

We address the problem of performing polarity classification on Twitter over different languages,... more We address the problem of performing polarity classification on Twitter over different languages, focusing on English and Spanish, comparing three techniques: (1) a monolingual model which knows the language in which the opinion is written, (2) a monolingual model that acts based on the decision provided by a language identification tool and (3) a multilingual model trained on a multilingual dataset that does not need any language recognition step. Results show that multilingual models are even able to outperform the monolingual models on some monolingual sets. We introduce the first code-switching corpus with sentiment labels, showing the robust-ness of a multilingual approach.

Resumen: Este artículo describe nuestra participación en la tercera edición del taller de análisi... more Resumen: Este artículo describe nuestra participación en la tercera edición del taller de análisis del sentimiento de tuits escritos en castellano, el tass 2014. En la evaluación competitiva de este año, se han propuesto cuatro retos: (1) análisis del sentimiento a nivel global, (2) clasificación de tópicos, (3) extracción de as-pectos y (4) análisis del sentimiento a nivel aspectual. Para las tareas 1 y 2 em-pleamos una aproximación basada en aprendizaje automático, donde distintos re-cursos lingüísticos e información extraída del conjunto de entrenamiento son uti-lizados para entrenar un clasificador supervisado. Para abordar la tarea 3, nuestra aproximación recolecta una lista de representaciones que es empleada para identi-ficar los aspectos requeridos por los organizadores. Porúltimo, la tarea 4 delega en heurísticas para identificar el alcance de cada aspecto, para después determinar su sentimiento a través de un clasificador supervisado. Los resultados experimentales son prometedores y nos servirán para desarrollar técnicas más complejas en el fu-turo. Palabras clave: Análisis del sentimiento, Clasificación de tópicos, Extracción de aspectos, Análisis del sentimiento a nivel aspectual. Abstract: This paper describes our participation at the third edition of the workshop on Sentiment Analysis focused on Spanish tweets, tass 2014. This year's evaluation campaign includes four challenges: (1) global sentiment analysis, (2) topic classification, (3) aspect-extraction and (4) aspect-based sentiment analysis. Tasks 1 and 2 are addressed from a machine learning approach, using several linguistic resources and other information extracted from the training corpus to feed to a supervised classifier. With respect to task 3, we develop a naive approach, collecting a set of representations to identify the predefined aspects requested by the organisers. Finally, task 4 uses heuristics to identify the scope of each aspect, to then classify their sentiment via a supervised classifier. The experimental results are promising and will serve us as the starting point to develop more complex techniques.

The vast amount of opinions and reviews provided in Twitter is helpful in order to make interesti... more The vast amount of opinions and reviews provided in Twitter is helpful in order to make interesting findings about a given industry, but given the huge number of messages published every day it is important to detect the relevant ones. In this respect, the Twitter search functionality is not a practical tool when we want to poll messages dealing with a given set of general topics. This article presents an approach to classify Twitter messages into various topics. We tackle the problem from a linguistic angle, taking into account part-of-speech, syntactic and semantic information, showing how language processing techniques should be adapted to deal with the informal language present in Twitter messages. The TASS 2013 General corpus, a collection of tweets which has been specifically annotated to perform text analytics tasks, is used as the dataset in our evaluation framework. We carry out a wide range of experiments to determine which kinds of linguistic information have the greatest impact on this task and how they should be combined in order to obtain the best-performing system. The results lead us to conclude that relating features by means of contextual information adds complementary knowledge over pure lexi

Transition-based parsing is a widely used approach for dependency parsing that combines high effi... more Transition-based parsing is a widely used approach for dependency parsing that combines high efficiency with expressive feature models. Many different transition systems have been proposed, often formalized in slightly different frameworks. In this article, we show that a large number of the known systems for projective dependency parsing can be viewed as variants of the same stack-based system with a small set of elementary transitions that can be composed into complex transitions and restricted in different ways.

Abstract: We present a novel technique to remove spurious ambiguity from transition systems for d... more Abstract: We present a novel technique to remove spurious ambiguity from transition systems for dependency parsing. Our technique chooses a canonical sequence of transition operations (computation) for a given dependency tree. Our technique can be applied to a large class of bottom-up transition systems, including for instance Nivre (2004) and Attardi (2006).

Abstract We define a new formalism, based on Sikkel's parsing schemata for constituency parsers, ... more Abstract We define a new formalism, based on Sikkel's parsing schemata for constituency parsers, that can be used to describe, analyze and compare dependency parsing algorithms. This abstraction allows us to establish clear relations between several existing projective dependency parsers and prove their correctness.

Abstract We present a system allowing the automatic transformation of parsing schemata to efficie... more Abstract We present a system allowing the automatic transformation of parsing schemata to efficient executable implementations of their corresponding algorithms. This system can be used to easily prototype, test and compare different parsing algorithms. In this work, it has been used to generate several different parsers for Context Free Grammars and Tree Adjoining Grammars. By comparing their performance on different sized, artificially generated grammars, we can measure their empirical computational complexity.

En este artículo se presenta un nuevo enfoque para abordar el análisis de dependencias basado en ... more En este artículo se presenta un nuevo enfoque para abordar el análisis de dependencias basado en transiciones. Se propone que el analizador construya un grafo no dirigido durante el proceso de análisis, en lugar de la estructura de dependencias dirigida clásica. A posteriori, la estructura no dirigida es transformada en un árbol de dependencias. Con ello se consigue reducir la propagación de errores propia de estos sistemas.

Resumen: La categorización de textos cortos es, hoy en dıa, un área importante de investigación d... more Resumen: La categorización de textos cortos es, hoy en dıa, un área importante de investigación debido a que gran parte de la información que recibimos y con la cual trabajamos habitualmente tiene esta caracterıstica (e-mails, mensajes de texto, resúmenes de noticias, entre otros). Distintos trabajos han reportado resultados interesantes en la categorización de textos incorporando información semántica a la representación de los documentos.

Resumen: Se presentan los esquemas de análisis sintáctico con corrección de errores, que permiten... more Resumen: Se presentan los esquemas de análisis sintáctico con corrección de errores, que permiten definir algoritmos de análisis sintáctico con corrección de errores de una manera abstracta y declarativa. Este formalismo puede utilizarse para describir dichos algoritmos de manera simple y uniforme, y proporciona una base formal para demostrar su corrección y otras propiedades.

Abstract We present a compiler that can be used to automatically obtain efficient Java implementa... more Abstract We present a compiler that can be used to automatically obtain efficient Java implementations of parsing algorithms from formal specifications expressed as parsing schemata. The system performs an analysis of the inference rules in the input schemata in order to determine the best data structures and indexes to use, and to ensure that the generated implementations are efficient.

Abstract We present a technique for the construction of efficient prototypes for natural language... more Abstract We present a technique for the construction of efficient prototypes for natural language parsing based on the compilation of parsing schemata to executable implementations of their corresponding algorithms. Taking a simple description of a schema as input, Java code for the corresponding parsing algorithm is generated, including schema-specific indexing code in order to attain efficiency. Key words: parsing schemata, context-free grammars, tree-adjoining grammars

ACL 2016. The 54th Annual Meeting of the Association for Computational Linguistics. Proceeedings of the Conference, Vol. 2 (Short Papers), 2016

6th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis. WASSA 2015. Workshop Proceedings, 2015

Resumen: En este trabajo se estudia el comportamiento de los algoritmos de análisis sintáctico má... more Resumen: En este trabajo se estudia el comportamiento de los algoritmos de análisis sintáctico más utilizados en el tratamiento de las Gramáticas de Adjunción de Árboles (TAG). Para ello se aplica una técnica de compilación que permite la transformación automática de esquemas de análisis sintáctico en implementaciones eficientes de los algoritmos que describen, lo que nos permite comparar el rendimiento de diferentes analizadores en un entorno homogéneo.