Miguel Angel Alonso Pardo | Universidade da Coruña (original) (raw)
Uploads
Papers by Miguel Angel Alonso Pardo
Applied Sciences
Parsing is a core natural language processing technique that can be used to obtain the structure ... more Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic informa...
Bookmarks Related papers MentionsView impact
We describe four systems to generate automatically bilingual dictionaries based on existing ones:... more We describe four systems to generate automatically bilingual dictionaries based on existing ones: three transitive systems differing only in the pivot language used, and a system based on a different approach which only needs monolingual corpora in both the source and target languages. All four methods make use of cross-lingual word embeddings trained on monolingual corpora, and then mapped into a shared vec- tor space. Experimental results confirm that our strategy has a good coverage and recall, achieving a performance comparable to to the best submitted systems on the TIAD 2019 gold standard set among the teams participating at the TIAD shared task.
Bookmarks Related papers MentionsView impact
Proces. del Leng. Natural, 2016
En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recurs... more En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recursos para el analisis sintactico. El metodo consiste en la adaptacion y combinacion de diferentes treebanks anotados con dependencias universales de variedades linguisticas proximas, con el objetivo de entrenar un analizador sintactico para la lengua elegida, en nuestro caso el gallego. Durante el proceso de seleccion y adaptacion de los treebanks de origen, analizamos el impacto de propiedades de tres niveles diferentes: (i) la distancia entre las lenguas de origen y destino, (ii) la adaptacion de caracteristicas lexico-ortograficas, y (iii) las directrices de anotacion entre los treebanks. Usando la estrategia propuesta, entrenamos un analizador sintactico estadistico para etiquetar, con resultados prometedores y sin datos previos de gallego, un peque˜no corpus de esta lengua. La correccion manual de este corpus, usado como gold-standard, nos permitio probar la eficacia del metodo propue...
Bookmarks Related papers MentionsView impact
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 2017
Bookmarks Related papers MentionsView impact
Pdln, 2001
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
El rendimiento de los sistemas de Recuperación de Información se ve limitado por los fenómenos de... more El rendimiento de los sistemas de Recuperación de Información se ve limitado por los fenómenos de variación lingüística presentes en los textos. Las técnicas de Procesamiento de Lenguaje Natural a nivel de palabra han mostrado su utilidad para reducir dicha variación. Proponemos en este artículo extender esta aproximación a la variación a nivel de frase; para ello se indexarán las dependencias sintácticas presentes en los documentos, las cuales son obtenidas por medio de un analizador sintáctico. Para reducir en lo posible el coste computacional asociado al proceso de análisis, hemos optado por emplear un analizador sintáctico superficial basado en cascadas de traductores de estado finito. Si bien este artículo se centra en el caso del español, nuestra aproximación es extensible a otros lenguajes adaptando convenientemente la gramática empleada por el analizador. Palabras clave: Análisis sintáctico superficial, traductores de estado finito.
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Lecture Notes in Computer Science, 1999
Bookmarks Related papers MentionsView impact
Current Issues in Linguistic Theory, 2000
Bookmarks Related papers MentionsView impact
Journal of Information Science, 2014
The vast number of opinions and reviews provided in Twitter is helpful in order to make interesti... more The vast number of opinions and reviews provided in Twitter is helpful in order to make interesting findings about a given industry, but given the huge number of messages published every day, it is important to detect the relevant ones. In this respect, the Twitter search functionality is not a practical tool when we want to poll messages dealing with a given set of general topics. This article presents an approach to classify Twitter messages into various topics. We tackle the problem from a linguistic angle, taking into account part-of-speech, syntactic and semantic information, showing how language processing techniques should be adapted to deal with the informal language present in Twitter messages. The TASS 2013 General corpus, a collection of tweets that has been specifically annotated to perform text analytics tasks, is used as the dataset in our evaluation framework. We carry out a wide range of experiments to determine which kinds of linguistic information have the greatest...
Bookmarks Related papers MentionsView impact
Lecture Notes in Computer Science, 2002
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
ir.ii.uam.es, 2010
Resumen En este artıculo presentamos el trabajo que en el Grupo LYS (Lengua y Sociedad de la Info... more Resumen En este artıculo presentamos el trabajo que en el Grupo LYS (Lengua y Sociedad de la Información) hemos venido desarrollando en fechas recientes en las áreas de recuperación de información tolerante a errores y recuperación de información multilingüe. El nexo común entre ambas lıneas de investigación es el empleo de n-gramas de caracteres como unidad de procesamiento, en detrimento de soluciones más convencionales basadas en palabras o frases. El empleo de n-gramas nos permite ...
Bookmarks Related papers MentionsView impact
Proceedings of Recent Advances in Natural Language Processing (International Conference RANLP 2007), ISBN, 2007
We present a technique for the construction of efficient prototypes for natural language parsing ... more We present a technique for the construction of efficient prototypes for natural language parsing based on the compilation of parsing schemata to executable implementations of their corresponding algorithms. Taking a simple description of a schema as input, Java code for the corresponding parsing algorithm is generated, including schema-specific indexing code in order to attain efficiency. Key words: parsing schemata, context-free grammars, tree-adjoining grammars
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
ABSTRACT
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Applied Sciences
Parsing is a core natural language processing technique that can be used to obtain the structure ... more Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic informa...
Bookmarks Related papers MentionsView impact
We describe four systems to generate automatically bilingual dictionaries based on existing ones:... more We describe four systems to generate automatically bilingual dictionaries based on existing ones: three transitive systems differing only in the pivot language used, and a system based on a different approach which only needs monolingual corpora in both the source and target languages. All four methods make use of cross-lingual word embeddings trained on monolingual corpora, and then mapped into a shared vec- tor space. Experimental results confirm that our strategy has a good coverage and recall, achieving a performance comparable to to the best submitted systems on the TIAD 2019 gold standard set among the teams participating at the TIAD shared task.
Bookmarks Related papers MentionsView impact
Proces. del Leng. Natural, 2016
En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recurs... more En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recursos para el analisis sintactico. El metodo consiste en la adaptacion y combinacion de diferentes treebanks anotados con dependencias universales de variedades linguisticas proximas, con el objetivo de entrenar un analizador sintactico para la lengua elegida, en nuestro caso el gallego. Durante el proceso de seleccion y adaptacion de los treebanks de origen, analizamos el impacto de propiedades de tres niveles diferentes: (i) la distancia entre las lenguas de origen y destino, (ii) la adaptacion de caracteristicas lexico-ortograficas, y (iii) las directrices de anotacion entre los treebanks. Usando la estrategia propuesta, entrenamos un analizador sintactico estadistico para etiquetar, con resultados prometedores y sin datos previos de gallego, un peque˜no corpus de esta lengua. La correccion manual de este corpus, usado como gold-standard, nos permitio probar la eficacia del metodo propue...
Bookmarks Related papers MentionsView impact
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 2017
Bookmarks Related papers MentionsView impact
Pdln, 2001
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
El rendimiento de los sistemas de Recuperación de Información se ve limitado por los fenómenos de... more El rendimiento de los sistemas de Recuperación de Información se ve limitado por los fenómenos de variación lingüística presentes en los textos. Las técnicas de Procesamiento de Lenguaje Natural a nivel de palabra han mostrado su utilidad para reducir dicha variación. Proponemos en este artículo extender esta aproximación a la variación a nivel de frase; para ello se indexarán las dependencias sintácticas presentes en los documentos, las cuales son obtenidas por medio de un analizador sintáctico. Para reducir en lo posible el coste computacional asociado al proceso de análisis, hemos optado por emplear un analizador sintáctico superficial basado en cascadas de traductores de estado finito. Si bien este artículo se centra en el caso del español, nuestra aproximación es extensible a otros lenguajes adaptando convenientemente la gramática empleada por el analizador. Palabras clave: Análisis sintáctico superficial, traductores de estado finito.
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Lecture Notes in Computer Science, 1999
Bookmarks Related papers MentionsView impact
Current Issues in Linguistic Theory, 2000
Bookmarks Related papers MentionsView impact
Journal of Information Science, 2014
The vast number of opinions and reviews provided in Twitter is helpful in order to make interesti... more The vast number of opinions and reviews provided in Twitter is helpful in order to make interesting findings about a given industry, but given the huge number of messages published every day, it is important to detect the relevant ones. In this respect, the Twitter search functionality is not a practical tool when we want to poll messages dealing with a given set of general topics. This article presents an approach to classify Twitter messages into various topics. We tackle the problem from a linguistic angle, taking into account part-of-speech, syntactic and semantic information, showing how language processing techniques should be adapted to deal with the informal language present in Twitter messages. The TASS 2013 General corpus, a collection of tweets that has been specifically annotated to perform text analytics tasks, is used as the dataset in our evaluation framework. We carry out a wide range of experiments to determine which kinds of linguistic information have the greatest...
Bookmarks Related papers MentionsView impact
Lecture Notes in Computer Science, 2002
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
ir.ii.uam.es, 2010
Resumen En este artıculo presentamos el trabajo que en el Grupo LYS (Lengua y Sociedad de la Info... more Resumen En este artıculo presentamos el trabajo que en el Grupo LYS (Lengua y Sociedad de la Información) hemos venido desarrollando en fechas recientes en las áreas de recuperación de información tolerante a errores y recuperación de información multilingüe. El nexo común entre ambas lıneas de investigación es el empleo de n-gramas de caracteres como unidad de procesamiento, en detrimento de soluciones más convencionales basadas en palabras o frases. El empleo de n-gramas nos permite ...
Bookmarks Related papers MentionsView impact
Proceedings of Recent Advances in Natural Language Processing (International Conference RANLP 2007), ISBN, 2007
We present a technique for the construction of efficient prototypes for natural language parsing ... more We present a technique for the construction of efficient prototypes for natural language parsing based on the compilation of parsing schemata to executable implementations of their corresponding algorithms. Taking a simple description of a schema as input, Java code for the corresponding parsing algorithm is generated, including schema-specific indexing code in order to attain efficiency. Key words: parsing schemata, context-free grammars, tree-adjoining grammars
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
ABSTRACT
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact