Miguel Angel Alonso Pardo | Universidade da Coruña (original) (raw)

Uploads

Papers by Miguel Angel Alonso Pardo

Research paper thumbnail of On the Use of Parsing for Named Entity Recognition

Applied Sciences

Parsing is a core natural language processing technique that can be used to obtain the structure ... more Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic informa...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Exploring cross-lingual word embeddings for the inference of bilingual dictionaries

We describe four systems to generate automatically bilingual dictionaries based on existing ones:... more We describe four systems to generate automatically bilingual dictionaries based on existing ones: three transitive systems differing only in the pivot language used, and a system based on a different approach which only needs monolingual corpora in both the source and target languages. All four methods make use of cross-lingual word embeddings trained on monolingual corpora, and then mapped into a shared vec- tor space. Experimental results confirm that our strategy has a good coverage and recall, achieving a performance comparable to to the best submitted systems on the TIAD 2019 gold standard set among the teams participating at the TIAD shared task.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Creación de un treebank de dependencias universales mediante recursos existentes para lenguas próximas: el caso del gallego

Proces. del Leng. Natural, 2016

En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recurs... more En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recursos para el analisis sintactico. El metodo consiste en la adaptacion y combinacion de diferentes treebanks anotados con dependencias universales de variedades linguisticas proximas, con el objetivo de entrenar un analizador sintactico para la lengua elegida, en nuestro caso el gallego. Durante el proceso de seleccion y adaptacion de los treebanks de origen, analizamos el impacto de propiedades de tres niveles diferentes: (i) la distancia entre las lenguas de origen y destino, (ii) la adaptacion de caracteristicas lexico-ortograficas, y (iii) las directrices de anotacion entre los treebanks. Usando la estrategia propuesta, entrenamos un analizador sintactico estadistico para etiquetar, con resultados prometedores y sin datos previos de gallego, un peque˜no corpus de esta lengua. La correccion manual de este corpus, usado como gold-standard, nos permitio probar la eficacia del metodo propue...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Towards Syntactic Iberian Polarity Classification

Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 2017

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Normalizaci�n de t�rminos multipalabra mediante pares de dependencia sint�ctica

Pdln, 2001

Bookmarks Related papers MentionsView impact

Research paper thumbnail of On Non-Termination in DCGs

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Compilation methods of minimal acyclic automata for large dictionaries

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Extracción de término ındice mediante cascadas de expresiones regulares

El rendimiento de los sistemas de Recuperación de Información se ve limitado por los fenómenos de... more El rendimiento de los sistemas de Recuperación de Información se ve limitado por los fenómenos de variación lingüística presentes en los textos. Las técnicas de Procesamiento de Lenguaje Natural a nivel de palabra han mostrado su utilidad para reducir dicha variación. Proponemos en este artículo extender esta aproximación a la variación a nivel de frase; para ello se indexarán las dependencias sintácticas presentes en los documentos, las cuales son obtenidas por medio de un analizador sintáctico. Para reducir en lo posible el coste computacional asociado al proceso de análisis, hemos optado por emplear un analizador sintáctico superficial basado en cascadas de traductores de estado finito. Si bien este artículo se centra en el caso del español, nuestra aproximación es extensible a otros lenguajes adaptando convenientemente la gramática empleada por el analizador. Palabras clave: Análisis sintáctico superficial, traductores de estado finito.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of LyS at TASS 2014: A Prototype for Extracting and Analysing Aspects from Spanish tweets

Bookmarks Related papers MentionsView impact

Research paper thumbnail of An Operational Model for Parsing De.nite Clause Grammars with In.nite Terms

Lecture Notes in Computer Science, 1999

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Generalized LR parsing for extensions of context-free grammars

Current Issues in Linguistic Theory, 2000

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A linguistic approach for determining the topics of Spanish Twitter messages

Journal of Information Science, 2014

The vast number of opinions and reviews provided in Twitter is helpful in order to make interesti... more The vast number of opinions and reviews provided in Twitter is helpful in order to make interesting findings about a given industry, but given the huge number of messages published every day, it is important to detect the relevant ones. In this respect, the Twitter search functionality is not a practical tool when we want to poll messages dealing with a given set of general topics. This article presents an approach to classify Twitter messages into various topics. We tackle the problem from a linguistic angle, taking into account part-of-speech, syntactic and semantic information, showing how language processing techniques should be adapted to deal with the informal language present in Twitter messages. The TASS 2013 General corpus, a collection of tweets that has been specifically annotated to perform text analytics tasks, is used as the dataset in our evaluation framework. We carry out a wide range of experiments to determine which kinds of linguistic information have the greatest...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Mixed Parsing of Tree Insertion and Tree Adjoining Grammars

Lecture Notes in Computer Science, 2002

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Bidirectional Bottom-Up Parser for Tag

Bookmarks Related papers MentionsView impact

Research paper thumbnail of El sistema ERIAL: LEIRA, un entorno para RI basado en PLN

Bookmarks Related papers MentionsView impact

Research paper thumbnail of RI con n-gramas: tolerancia a errores y multilingüismo

ir.ii.uam.es, 2010

Resumen En este artıculo presentamos el trabajo que en el Grupo LYS (Lengua y Sociedad de la Info... more Resumen En este artıculo presentamos el trabajo que en el Grupo LYS (Lengua y Sociedad de la Información) hemos venido desarrollando en fechas recientes en las áreas de recuperación de información tolerante a errores y recuperación de información multilingüe. El nexo común entre ambas lıneas de investigación es el empleo de n-gramas de caracteres como unidad de procesamiento, en detrimento de soluciones más convencionales basadas en palabras o frases. El empleo de n-gramas nos permite ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Prototyping Efficient Natural Language Parsers

Proceedings of Recent Advances in Natural Language Processing (International Conference RANLP 2007), ISBN, 2007

We present a technique for the construction of efficient prototypes for natural language parsing ... more We present a technique for the construction of efficient prototypes for natural language parsing based on the compilation of parsing schemata to executable implementations of their corresponding algorithms. Taking a simple description of a schema as input, Java code for the corresponding parsing algorithm is generated, including schema-specific indexing code in order to attain efficiency. Key words: parsing schemata, context-free grammars, tree-adjoining grammars

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A New Approach to the Construction of Generalized LR Parsing Algorithms

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Análisis eficiente de Gramáticas de Inserción de Arboles

ABSTRACT

Bookmarks Related papers MentionsView impact

Research paper thumbnail of LyS at TASS 2013: Analysing Spanish tweets by means of dependency parsing, semantic-oriented lexicons and psychometric word-properties

Bookmarks Related papers MentionsView impact

Research paper thumbnail of On the Use of Parsing for Named Entity Recognition

Applied Sciences

Parsing is a core natural language processing technique that can be used to obtain the structure ... more Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic informa...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Exploring cross-lingual word embeddings for the inference of bilingual dictionaries

We describe four systems to generate automatically bilingual dictionaries based on existing ones:... more We describe four systems to generate automatically bilingual dictionaries based on existing ones: three transitive systems differing only in the pivot language used, and a system based on a different approach which only needs monolingual corpora in both the source and target languages. All four methods make use of cross-lingual word embeddings trained on monolingual corpora, and then mapped into a shared vec- tor space. Experimental results confirm that our strategy has a good coverage and recall, achieving a performance comparable to to the best submitted systems on the TIAD 2019 gold standard set among the teams participating at the TIAD shared task.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Creación de un treebank de dependencias universales mediante recursos existentes para lenguas próximas: el caso del gallego

Proces. del Leng. Natural, 2016

En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recurs... more En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recursos para el analisis sintactico. El metodo consiste en la adaptacion y combinacion de diferentes treebanks anotados con dependencias universales de variedades linguisticas proximas, con el objetivo de entrenar un analizador sintactico para la lengua elegida, en nuestro caso el gallego. Durante el proceso de seleccion y adaptacion de los treebanks de origen, analizamos el impacto de propiedades de tres niveles diferentes: (i) la distancia entre las lenguas de origen y destino, (ii) la adaptacion de caracteristicas lexico-ortograficas, y (iii) las directrices de anotacion entre los treebanks. Usando la estrategia propuesta, entrenamos un analizador sintactico estadistico para etiquetar, con resultados prometedores y sin datos previos de gallego, un peque˜no corpus de esta lengua. La correccion manual de este corpus, usado como gold-standard, nos permitio probar la eficacia del metodo propue...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Towards Syntactic Iberian Polarity Classification

Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 2017

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Normalizaci�n de t�rminos multipalabra mediante pares de dependencia sint�ctica

Pdln, 2001

Bookmarks Related papers MentionsView impact

Research paper thumbnail of On Non-Termination in DCGs

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Compilation methods of minimal acyclic automata for large dictionaries

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Extracción de término ındice mediante cascadas de expresiones regulares

El rendimiento de los sistemas de Recuperación de Información se ve limitado por los fenómenos de... more El rendimiento de los sistemas de Recuperación de Información se ve limitado por los fenómenos de variación lingüística presentes en los textos. Las técnicas de Procesamiento de Lenguaje Natural a nivel de palabra han mostrado su utilidad para reducir dicha variación. Proponemos en este artículo extender esta aproximación a la variación a nivel de frase; para ello se indexarán las dependencias sintácticas presentes en los documentos, las cuales son obtenidas por medio de un analizador sintáctico. Para reducir en lo posible el coste computacional asociado al proceso de análisis, hemos optado por emplear un analizador sintáctico superficial basado en cascadas de traductores de estado finito. Si bien este artículo se centra en el caso del español, nuestra aproximación es extensible a otros lenguajes adaptando convenientemente la gramática empleada por el analizador. Palabras clave: Análisis sintáctico superficial, traductores de estado finito.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of LyS at TASS 2014: A Prototype for Extracting and Analysing Aspects from Spanish tweets

Bookmarks Related papers MentionsView impact

Research paper thumbnail of An Operational Model for Parsing De.nite Clause Grammars with In.nite Terms

Lecture Notes in Computer Science, 1999

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Generalized LR parsing for extensions of context-free grammars

Current Issues in Linguistic Theory, 2000

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A linguistic approach for determining the topics of Spanish Twitter messages

Journal of Information Science, 2014

The vast number of opinions and reviews provided in Twitter is helpful in order to make interesti... more The vast number of opinions and reviews provided in Twitter is helpful in order to make interesting findings about a given industry, but given the huge number of messages published every day, it is important to detect the relevant ones. In this respect, the Twitter search functionality is not a practical tool when we want to poll messages dealing with a given set of general topics. This article presents an approach to classify Twitter messages into various topics. We tackle the problem from a linguistic angle, taking into account part-of-speech, syntactic and semantic information, showing how language processing techniques should be adapted to deal with the informal language present in Twitter messages. The TASS 2013 General corpus, a collection of tweets that has been specifically annotated to perform text analytics tasks, is used as the dataset in our evaluation framework. We carry out a wide range of experiments to determine which kinds of linguistic information have the greatest...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Mixed Parsing of Tree Insertion and Tree Adjoining Grammars

Lecture Notes in Computer Science, 2002

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A Bidirectional Bottom-Up Parser for Tag

Bookmarks Related papers MentionsView impact

Research paper thumbnail of El sistema ERIAL: LEIRA, un entorno para RI basado en PLN

Bookmarks Related papers MentionsView impact

Research paper thumbnail of RI con n-gramas: tolerancia a errores y multilingüismo

ir.ii.uam.es, 2010

Resumen En este artıculo presentamos el trabajo que en el Grupo LYS (Lengua y Sociedad de la Info... more Resumen En este artıculo presentamos el trabajo que en el Grupo LYS (Lengua y Sociedad de la Información) hemos venido desarrollando en fechas recientes en las áreas de recuperación de información tolerante a errores y recuperación de información multilingüe. El nexo común entre ambas lıneas de investigación es el empleo de n-gramas de caracteres como unidad de procesamiento, en detrimento de soluciones más convencionales basadas en palabras o frases. El empleo de n-gramas nos permite ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Prototyping Efficient Natural Language Parsers

Proceedings of Recent Advances in Natural Language Processing (International Conference RANLP 2007), ISBN, 2007

We present a technique for the construction of efficient prototypes for natural language parsing ... more We present a technique for the construction of efficient prototypes for natural language parsing based on the compilation of parsing schemata to executable implementations of their corresponding algorithms. Taking a simple description of a schema as input, Java code for the corresponding parsing algorithm is generated, including schema-specific indexing code in order to attain efficiency. Key words: parsing schemata, context-free grammars, tree-adjoining grammars

Bookmarks Related papers MentionsView impact

Research paper thumbnail of A New Approach to the Construction of Generalized LR Parsing Algorithms

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Análisis eficiente de Gramáticas de Inserción de Arboles

ABSTRACT

Bookmarks Related papers MentionsView impact

Research paper thumbnail of LyS at TASS 2013: Analysing Spanish tweets by means of dependency parsing, semantic-oriented lexicons and psychometric word-properties

Bookmarks Related papers MentionsView impact