Ana Garcia Serrano - Profile on Academia.edu (original) (raw)

Papers by Ana Garcia Serrano

Revista de Humanidades Digitales

Este artículo se centra en el análisis de dos investigaciones de diverso signo guiadas por la int... more Este artículo se centra en el análisis de dos investigaciones de diverso signo guiadas por la inteligencia artificial dentro del campo de las HD. El primero es una investigación muy conocida y exitosa de dos lingüistas que resuelven un caso de atribución de autoría a través de la construcción de un corpus digital de 150 obras de 40 novelistas italianos. El segundo es la investigación llevada a cabo en el corpus digital DIMH (El Dibujante Ingeniero al servicio de la Monarquía Hispánica. Siglos XVI-XVIII), una evolución de la Colección de mapas, planos y dibujos del Archivo General de Simancas (siglos XVI-XVIII), cuyo objetivo fue desarrollar herramientas de soporte a tareas de anotación semántica, búsqueda de información, extracción de relaciones ocultas en los textos y visualización de los resultados para facilitar la investigación de los historiadores. A través de estos dos ejemplos, este artículo busca mostrar los métodos, procesos y posibilidades de éxito en problemas complejos d...

arXiv (Cornell University), May 18, 2022

This registered report introduces the largest, and for the first time, reproducible experimental ... more This registered report introduces the largest, and for the first time, reproducible experimental survey on biomedical sentence similarity with the following aims: (1) to elucidate the state of the art of the problem; (2) to solve some reproducibility problems preventing the evaluation of most of current methods; (3) to evaluate several unexplored sentence similarity methods; (4) to evaluate for the first time an unexplored benchmark, called Corpus-Transcriptional-Regulation (CTR); (5) to carry out a study on the impact of the pre-processing stages and Named Entity Recognition (NER) tools on the performance of the sentence similarity methods; and finally, (6) to bridge the lack of software and data reproducibility resources for methods and experiments in this line of research. Our reproducible experimental survey is based on a single software platform, which is provided with a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results. In addition, we introduce a new aggregated string-based sentence similarity method, called LiBlock, together with eight variants of current ontology-based methods, and a new pre-trained word embedding model trained on the full-text articles in the PMC-BioC corpus. Our experiments show that our novel string-based measure sets the new state of the art on the sentence similarity task in the biomedical domain and significantly outperforms all the methods evaluated herein, with the only exception of one ontology-based method. Likewise, our experiments confirm that the pre-processing stages, and the choice of the NER tool for ontology-based methods, have a very significant impact on the performance of the sentence similarity methods. We also detail some drawbacks and limitations of current methods, and warn on the need of refining the current benchmarks. Finally, a noticeable finding is that our new string-based method significantly outperforms all state-of-the-art Machine Learning (ML) models evaluated herein.

Revista de Humanidades Digitales, 2017

Las Humanidades Digitales pretenden facilitar el acceso y entendimiento de documentos históricos ... more Las Humanidades Digitales pretenden facilitar el acceso y entendimiento de documentos históricos mediante aplicaciones informáticas. En este proceso es importante la etapa de representación formal y digital de los contenidos para facilitar el posterior proceso de estos en aplicaciones, por ejemplo, de acceso y visualización, búsqueda y organización automática de contenidos. En este sentido, este trabajo presenta diferentes aproximaciones de representación en el ámbito del proyecto DIMH (accesible desde https://dimh.hypotheses.org/). En particular se detalla el desarrollo de una ontología para la representación de los contenidos del corpus DIMH. Para reforzar la comprensión de la estructura ontológica y mostrar la potencia de los modelos ontológicos, se presentan una serie de ejemplos prácticos y de consultas.

Proces. del Leng. Natural, 2021

Unsupervised Named Entity Recognition (NER) approaches do not depend on labelled data to function... more Unsupervised Named Entity Recognition (NER) approaches do not depend on labelled data to function properly but rather on a source of knowledge, in which promising candidates can be looked up to find the corresponding concept. In the biomedical domain knowledge source like this already exists; namely the Unified Medical Language System (UMLS). In this paper, three different unsupervised NER models using UMLS, namely MetaMap, cTakes and MetaMapLite are evaluated and compared from the results published by Demner-Fushman, Rogers and Aronson (2017) and Reategui and Ratte (2018). The Unsupervised Biomedical Named Entity Recognition framework (UB-NER) is developed, with which the results of the experiments of the three models, five datasets and two NER tasks are presented.

Lecture Notes in Computer Science, 2004

This paper describes the first set of experiments defined by the MIRACLE (Multilingual Informatio... more This paper describes the first set of experiments defined by the MIRACLE (Multilingual Information RetrievAl for the CLEf campaign) research group for some of the cross language tasks defined by CLEF. These experiments combine different basic techniques, linguistic-oriented and statistic-oriented, to be applied to the indexing and retrieval processes.

Proceedings of the XVII International Conference on Human Computer Interaction, 2016

There are some similarities in developing a traditional Higher Education (HE) eLearning course an... more There are some similarities in developing a traditional Higher Education (HE) eLearning course and MOOCs (Massive Open Online Courses), due to the use of the basis of eLearning instructional design. But in MOOCs, students should be continually influenced by information, social interactions and experiences forcing the faculty to come up with new approaches and ideas to develop a really engaging course. In this paper, the process of MOOCifying an online course on Universal Accessibility is detailed. The needed quality model is based upon the one used for all online degree programs at our university and on a variable metric specially designed for UNED MOOC courses making possible to control how each course was structured, what kind of resources were used and how activities, interaction and assessment were included. The learning activities were completely adapted, along with the content itself and the on-line assessment. For this purpose, the Gardner's Multiple Intelligences Product Grid has been selected.

The UNED-UV group at the ImageCLEF2013 Campaign have participated in the Scalable Concept Image A... more The UNED-UV group at the ImageCLEF2013 Campaign have participated in the Scalable Concept Image Annotation subtask. We present a multimedia IR-based system for the annotation task. In this collection, the images do not have any textual description associated, so we have downloaded and preprocessed the web pages which contain the images. Regarding the concepts, we expanded their textual description with additional information from external resources as Wikipedia or WordNet and we generate a KLD concept model using recovered textual information. The multimedia IR-based system uses a logistic relevance algorithm to get a model for each of the concepts to be trained using visual image features. Finally, the fusion subsystem merges textual and visual scores for a certain image to belong a concept, and decides the presence of the concept in the images.

Procesamiento Del Lenguaje Natural, Mar 4, 2015

Resumen: El tiempo es un elemento de importancia capital en todo espacio de información y Twitter... more Resumen: El tiempo es un elemento de importancia capital en todo espacio de información y Twitter no es una excepción. La explotación de la información temporal en tareas de recuperación y organización de información, tiene una larga tradición. Sin embargo, esta clase de enfoques, basados en contenido, no han sido muy explorados para el dominio de Twitter, y en consecuencia escasean los Corpus de tweets anotados con información temporal. En este artículo, se propone un modelo de anotación de la información temporal en el dominio de Twitter, basado en el Análisis de Conceptos Formales, en el que los atributos del contexto serán las expresiones temporales, eventos y tipos de eventos presentes en los tweets. Se define un Calendario especialmente adecuado a los fenómenos de conmemoración de aniversarios y fechas señaladas en Twitter, el Calendario Imaginario-Colectivo. El Corpus de estudio ha sido extraido de la colección de RepLab2013. Se incluye un completo análisis del mismo desde una perspectiva temporal. Palabras clave: Información temporal, Anotación temporal de tweets, Representación de información basada en contenido

Transportation Research Part C: Emerging Technologies, 2005

Modern decision support systems (DSS) not only store large amounts of decision-relevant data, but... more Modern decision support systems (DSS) not only store large amounts of decision-relevant data, but also aim at assisting decision-makers to explore the meaning of that data, and to take decisions based on understanding. In transportation domains, a multiagent approach to the construction of DSS is becoming increasingly popular, because it does not only reduce design complexity, but it also adequately supports a dialogue-based stance on decision support interactions. However, despite recent advances in the field of agent-oriented software engineering, a principled approach to the design of multiagent systems for decision support is still to come. In this paper, we outline a design method for the construction of agent-based DSS. Setting out from an organisational and communicative model of decision support environments, we present an abstract

1 Departamento de Inteligencia Artificial, Universidad Politecnica de Madrid, Campus de Montegancedo s/n, E-28660 Boadilla del Monte, Madrid 2 Departamento de Ingenierfa Electrica, Electr6nica y de Control, UNED, Escuela Tecnica Superior de Ingenieros Industriales, Apdo, E-60149, Madrid

Collaborative dialogue technologies in distance learning, 1994

Additional file 1 of HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey

Additional file 1: We provide the Appendix A entitled "The reproducible benchmarks of biomed... more Additional file 1: We provide the Appendix A entitled "The reproducible benchmarks of biomedical semantic measures libraries" as supplementary material in one additional file. Appendix A introduces a detailed experimental setup, which is based on a publicly available reproducibility dataset [65] provided as supplementary material to allow the exact replication of all the experiments and results reported herein, as well as providing the source code of our benchmarks.

A reproducibility protocol and dataset on the biomedical sentence similarity v3

This protocol introduces a set of reproducibility resources with the aim of allowing the exact re... more This protocol introduces a set of reproducibility resources with the aim of allowing the exact replication of the experiments introduced by our main paper [1], which introduces the largest and for the first time reproducible experimental survey on biomedical sentence similarity. HESML V2R1 [2] is the sixth release of our Half-Edge Semantic Measures Library (HESML), which is a linearly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models for ontologies like WordNet, SNOMED-CT, MeSH and GO. This protocol sets a self-contained reproducibility platform which contains the Java source code and binaries of our main benchmark program, as well as a Docker image which allows the exact replication of our experiments in any software platform supported by Docker, such as all Linux-based operating systems, Windows or MacOS. All the necessary resources for executing the experiments are published in the permanent repository ...

This dataset introduces a set of reproducibility resources with the aim of allowing the exact rep... more This dataset introduces a set of reproducibility resources with the aim of allowing the exact replication of the experiments introduced by our companion paper, which compare the performance of the three UMLS-based semantic similarity libraries reported in the literature as follows: (1) UMLS::Similarity [20], (2) Semantic Measures Library (SML) [3], and the latest version of our Half-Edge Semantic Measures Library (HESML) introduced in our aforementioned companion paper. HESML V1R5 is the fifth release of our Half-Edge Semantic Measures Library (HESML) detailed in [15] which is a linearly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models for ontologies like WordNet, SNOMED-CT, MeSH and GO. This dataset sets a self-contained reproducibility platform which contains the Java source code and binaries of our main benchmark program, as well as a Docker image which allows the exact replication of our experiments i...

Concept-based Organization for semi-automatic Knowledge Inference in Digital Humanities: Modelling and Visualization

Whitestein Series in Software Agent Technologies

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009

This paper describes the participation of the MIRACLE team 1 at the ImageCLEF Photographic Retrie... more This paper describes the participation of the MIRACLE team 1 at the ImageCLEF Photographic Retrieval task of CLEF 2008. We succeeded in submitting 41 runs. Obtained results from text-based retrieval are better than content-based as previous experiments in the MIRACLE team campaigns [5, 6] using different software. Our main aim was to experiment with several merging approaches to fuse text-based retrieval and content-based retrieval results, and it happened that we improve the text-based baseline when applying one of the three merging algorithms, although visual results are lower than textual ones.

INTELIGENCIA ARTIFICIAL, 2002

Una de las hipótesis para la mejora de la interacción persona-ordenador se basa en el uso del len... more Una de las hipótesis para la mejora de la interacción persona-ordenador se basa en el uso del lenguaje natural; en un primer nivel básico se emplea conocimiento lingüístico simple en aquellos procesos no muy complejos involucrados en la interacción. En este trabajo se muestran aspectos relacionados con la integración de la tecnología disponible de tratamiento de lenguaje natural en el desarrollo de un metabuscador que alcance un mayor grado de acierto en la recuperación de información realizada por un buscador tradicional así como en el tratamiento posterior de los documentos recuperados. En particular, se describe el proceso realizado para la extensión de las consultas de los usuarios con información lingüística empleando dos recursos léxicos para el castellano: ARIES para el tratamiento de la morfología y EuroWordnet para el tratamiento de la semántica. Este trabajo forma parte del sistema MESIA, Modelo computacional para extracción selectiva de información de textos cortos, que amplía la búsqueda habitual (consulta y presentación de resultados) con nuevas capacidades morfológicas y semánticas y analiza otros aspectos obtenidos a partir de la estructura de las páginas, del tratamiento lingüístico de algunas de las unidades de texto seleccionadas automáticamente y de la experiencia de uso. El sistema está diseñado para el sitio Web de la Comunidad Autónoma de Madrid (CAM), lo que representa una restricción en la cantidad de información disponible, pero se mantiene la problemática general de la búsqueda de información ya que la información contenida en estas páginas abarca prácticamente todas las categorías informativas que la Administración puede ofrecer al ciudadano.

IFIP International Federation for Information Processing

Virtual assistants are a promising business for the near future in the web era. This implies that... more Virtual assistants are a promising business for the near future in the web era. This implies that the supporting applications have to be endowed with advanced capabilities to service offerings and to communicate with the users in a more direct and natural way. This paper presents the agent-based architecture of the virtual assistant and focuses on the dialogue module. The content exchange between the agents is based on communicative acts to cope with the complexity of unrestricted language used by human users communicating with online assistants. The assistant is capable to interact with users and to provide the right output through the exploitation of different information sources. The approach was applied and tested on the insurance field in the frame of the European research project VIP-Advisor 1 .

Lecture Notes in Computer Science, 2004

ImageCLEF is a pilot experiment run at CLEF 2003 for cross language image retrieval using textual... more ImageCLEF is a pilot experiment run at CLEF 2003 for cross language image retrieval using textual captions related to image contents. In this paper, we describe the participation of the MIRACLE research team (Multilingual Information RetrievAl at CLEF), detailing the different experiments and discussing their preliminary results.

Interacting with Computers, 2006

This paper presents an interaction model pursuing flexible and coherent human-computer interactio... more This paper presents an interaction model pursuing flexible and coherent human-computer interaction. Starting from a cognitive architecture for Natural Interaction, an agent-based design is presented, focusing particularly on the role of the interaction agent. Regarding the intentional processing within this agent, the Threads Model is proposed. Finally, its implementation is described and evaluated to find out the integrity of the intentional approach.

Revista de Humanidades Digitales

arXiv (Cornell University), May 18, 2022

Revista de Humanidades Digitales, 2017

Proces. del Leng. Natural, 2021

Lecture Notes in Computer Science, 2004

Proceedings of the XVII International Conference on Human Computer Interaction, 2016

Procesamiento Del Lenguaje Natural, Mar 4, 2015

Transportation Research Part C: Emerging Technologies, 2005

Collaborative dialogue technologies in distance learning, 1994

Additional file 1 of HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey

A reproducibility protocol and dataset on the biomedical sentence similarity v3

Concept-based Organization for semi-automatic Knowledge Inference in Digital Humanities: Modelling and Visualization

Whitestein Series in Software Agent Technologies

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2009

INTELIGENCIA ARTIFICIAL, 2002

IFIP International Federation for Information Processing

Lecture Notes in Computer Science, 2004

Interacting with Computers, 2006