Sonia Sanchez-Cuadrado | Universidad Complutense de Madrid (original) (raw)

Papers by Sonia Sanchez-Cuadrado

Scire, Jun 20, 2023

Resumen En recuperación de información web, los motores de búsqueda como Google incluyen funciona... more Resumen En recuperación de información web, los motores de búsqueda como Google incluyen funcionalidades que devuelven respuestas directas a las consultas de los usuarios. Estas respuestas tratan de resolver una necesidad informativa y se conocen como rich answers. Para determinar cómo se presentan estos resultados y cómo afecta la optimización de los motores de búsqueda, se ha realizado un análisis de las respuestas directas destacadas que presenta Google. En este trabajo se han examinado preguntas de tipo informacional expresadas en lenguaje natural con los términos "what is". Se ha analizado el listado de los resultados para identificar las características de las respuestas directas. Además, se han explorado las estrategias SEO que puedan determinar la relevancia del fragmento respecto de la consulta. Con este trabajo se constata que la respuesta no se extrae necesariamente de forma literal de una página web y se comprueba que la solución a las preguntas puede proceder de varios recursos. Los fragmentos de respuesta directa y otros rich answers pueden llegar a ocupar cerca de la mitad de la página principal de resultados, cobrando un mayor protagonismo y desplazando al resto de los resultados orgánicos. Las respuestas directas proporcionan un cambio en los hábitos de búsqueda y un nuevo modo de navegar en la red basado en un sistema hiperenlazado de pregunta respuesta. Palabras clave: Respuestas enriquecidas. Fragmentos enriquecidos. GAB. SEO. Motores de búsqueda. Google. Optimización de respuestas.

Scire, Sep 5, 2018

Evaluación de la comprensión de los paneles interpretativos en parajes naturales Comprehension as... more Evaluación de la comprensión de los paneles interpretativos en parajes naturales Comprehension assessment of informative panels on natural landscape Yaiza SERNA SANS (1), Jorge MORATO LARA (1), Sonia SÁNCHEZ CUADRADO (2)

Revista Española de Documentación Científica, 2022

Se describe la elaboración de una ontología de dominio para la representación de la literatura gr... more Se describe la elaboración de una ontología de dominio para la representación de la literatura grecolatina en forma de datos enlazados. Se analizan los principios de la Web Semántica y la difusión semántica de contenido aplicados a la literatura clásica grecolatina. Se ha adaptado la metodología Methontology para la construcción de ontologías y se ha implementado un recurso en lenguaje formalizado. El resultado de esta investigación ha sido la elaboración de un proyecto piloto de datos enlazados basado en los principios y tecnologías Linked Open Data (LOD) en el campo de la literatura comparada, desarrollando la ontología Litcomp para la mejora del estudio acerca de la influencia y la pervivencia de la literatura grecolatina.

Lecture Notes in Computer Science, 2020

One important requirement for a web page to be accessible for all, according to the current inter... more One important requirement for a web page to be accessible for all, according to the current international recommendations from the W3C Accessibility Initiative is that the text should be readable and understandable to the broadest audience possible. Nowadays, unfortunately, the information included in the web pages are not easy to read and understand to everybody. This paper introduces the Comp4Text online readability evaluation tool, which is able to calculate the readability level of a web page based on classical linguistic measures (sentence to sentence) and detect unusual words and abbreviations. Moreover, it provides recommendations to solve the readability problems and show everything in a very visual way. Thanks to this tool, the web page designers and writers could improve their sites, being easier to be read and understand for all. Currently, Comp4Text is based on the Spanish language, but it can be easily extended to other languages if the readability metrics and easy-to-read rules are well-known.

Ibersid: revista de sistemas de información y documentación, 2020

Se presenta el desarrollo de una aplicación móvil de bajo costo para una red de bibliotecas. Exis... more Se presenta el desarrollo de una aplicación móvil de bajo costo para una red de bibliotecas. Existe una necesidad de aplicaciones que proporcionen servicio a los usuarios de las bibliotecas de acuerdo con los usos actuales, fidelizando a los usuarios y simplificando el acceso recurrente a múltiples sitios web. Se utiliza la Red Valenciana de Lectura Pública como estudio de caso para ilustrar la propuesta. El punto de partida es un proceso analítico relativo a las características de la entidad y los requisitos de la aplicación móvil. Para el desarrollo de la aplicación se comparan diferentes plataformas para la construcción de aplicaciones móviles. A continuación, se evalúa el producto final en relación con la eficiencia y la facilidad de uso. Los resultados indican que la utilización de una aplicación, que integra la información en un único punto, mejora el rendimiento en términos de tiempo de búsqueda y tasa de error. La principal contribución de este trabajo destaca las apps como ...

It is well known that linguistic features of a written text affect its readability, understanding... more It is well known that linguistic features of a written text affect its readability, understanding readability as the ease with which a reader can understand the text. This paper is focused on the analysing of the influence of some linguistic features on the readability of current Spanish e-Government websites. Specifically, the “familiarity” of the terms on web pages, as well as the “frequency” of these terms are studied, among others. Firstly, this research has analysed a corpus extracted from the current information websites of the Spanish eGovernment and its simplified counterparts. Then, using machine learning methods, a supervised model is built on the influence of different term familiarity lists on text readability in the corpus. Different term lists have been tested and it has been concluded that the differences between them have a great impact on their performance. An accuracy of 81% has been achieved with a combination of frequency lists. As a conclusion, term lists and th...

Education for Information, 2019

Building and checking concept maps is an active research topic in visual learning. Concept maps a... more Building and checking concept maps is an active research topic in visual learning. Concept maps are intended to show visual representations of interrelated concepts in educational and professional settings. For the last decades, numerous formulas have been proposed to compute the semantic proximity between any pair of concepts in the map. A review of the employment of semantic distances in concept map construction shows the lack of a clear criterion to select a suitable formula. Traditional metrics can be basically grouped depending on the representation of their knowledge source: statistic approaches based on co-occurrence of words in big corpora; path-based methods using lexical structures, like taxonomies; and multi-source methods which combine statistic approaches and path-based methods. On the one hand, path-based measures give better results than corpora-based metrics, but they cannot be used to process specific concepts or proper nouns due to the limited vocabulary of the taxonomies used. On the other side, information obtained from big corpora-including the World Wide Web-is not organized in a specific way and natural language processing techniques are usually needed in order to obtain acceptable results. In this research Wikipedia is proposed since it does not have such limitations. This article defines an approach to adapt path-based semantic similarity measures to Wikipedia for building concept maps. Experimental evaluation with a well-known set of human similarity judgments shows that the Wikipedia adapted metrics obtains equal or even better results when compared with the non-adapted approaches.

Este proyecto desarrolla un juego de preguntas y respuestas para que los alumnos puedan aprender ... more Este proyecto desarrolla un juego de preguntas y respuestas para que los alumnos puedan aprender y memorizar de un modo ludico los contenidos de diferentes asignaturas del Grado de Informacion y Documentacion mediante una APP movil.

Proceedings of the 9th International Conference on Software Development and Technologies for Enhancing Accessibility and Fighting Info-exclusion, 2020

People with cognitive and learning disabilities make up a significant percentage of the general p... more People with cognitive and learning disabilities make up a significant percentage of the general population. Among the difficulties of this group, it is frequent that they present a poor comprehension of texts. The easy-to-read guidelines facilitate the comprehension of texts but adapting texts according to these rules is a time-consuming manual labour. There are automatic approaches to simplify texts and assess difficulty after the text is published, but there is not a tool to help writers and editors in the editorial process. In this paper a platform is presented to assist in this process by visually identifying problematic elements indicated in the guidelines. This approach allows publishers to do their job more efficiently.

... Contributors, José González Moreiro, Juan Morillo Llorens, Miguel Ángel García-Quismond Marza... more ... Contributors, José González Moreiro, Juan Morillo Llorens, Miguel Ángel García-Quismond Marzal, Jorge Morato Lara, Pilar Beltrán Orenes, Sonia Sánchez Cuadrado. Repository, Redalyc (Red de Revistas Cientificas de America Latina, El Caribe, Espana y Portugal) (Mexico). ...

El profesional de la información, May 1, 2012

Resumen Se define el concepto organización del conocimiento como el dominio donde interacciona la... more Resumen Se define el concepto organización del conocimiento como el dominio donde interacciona la investigación científica con su aplicación al desarrollo de sistemas. Se citan las disciplinas que abarca, como la ciencia de la información; y sus productos, como los sistemas de clasificación. Se presentan tendencias recientes y trabajos actuales. Se cita la sociedad ISKO y se presentan brevemente sus actividades.

El Profesional de la Información, 2018

Desde el año 2000 es profesor de los estudios de Biblioteconomía y Documentación e Informática. H... more Desde el año 2000 es profesor de los estudios de Biblioteconomía y Documentación e Informática. Ha dirigido proyectos docentes y de investigación, tanto públicos como privados, sobre gestión de la información. Lidera el grupo de investigación GigaBD y dirige el

2020 15th Iberian Conference on Information Systems and Technologies (CISTI)

This paper proposes the automated evaluation of readability, through the analysis of different li... more This paper proposes the automated evaluation of readability, through the analysis of different linguistic characteristics associated with a better understanding of the websites of the Spanish Government's administrative procedures. To fulfil this task, a corpus made by web documents with distinct difficulty levels has been gathered. Then, these documents' difficulty is assessed through different classic readability metrics. By the use of machine learning methods, different algorithms are analyzed to measure their capability to predict text difficulty. The results obtained show that the official Spanish Government websites have a high difficulty level. The main contribution of this work is the combined application of a wide number of linguistic attributes and the construction of a new corpus addressed to official government texts.

Journal of Information Systems Engineering and Management, 2021

This paper automatically evaluates the readability of Spanish e-government websites. Specifically... more This paper automatically evaluates the readability of Spanish e-government websites. Specifically, the websites collected explain e-government administrative procedures. The evaluation is carried out through the analysis of different linguistic characteristics that are presumably associated with a better understanding of these resources. To this end, texts from websites outside the government websites have been collected. These texts clarify the procedures published on the Spanish Government's websites. These websites constitute the part of the corpus considered as the set of easy documents. The rest of the corpus has been completed with counterpart documents from government websites. The text of the documents has been processed, and the difficulty is evaluated through different classic readability metrics. At a later stage, automatic learning methods are used to apply algorithms to predict the difficulty of the text. The results of the study show that government web pages show high values for comprehension difficulty. This work proposes a new Spanish-language corpus of official e-government websites. In addition, a large number of combined linguistic attributes are applied, which improve the identification of the level of comprehensibility of a text with respect to classic metrics.

Sustainability, 2021

The exponential evolution of technology and the growth of the elderly population are two phenomen... more The exponential evolution of technology and the growth of the elderly population are two phenomena that will inevitably interact with increasing frequency in the future. This paper analyses scientific literature as a means of furthering progress in sustainable technology for senior living. We carried out a bibliometric analysis of papers published in this area and compiled by the Web of Science (WOS) and Scopus, examining the main participants and advances in the field from 2000 to the first quarter of 2021. The study describes some interesting research projects addressing three different aspects of older adults’ daily lives—health, daily activities and wellbeing—and policies to promote healthy aging and improve the sustainability of the healthcare system. It also looks at lines of research into transversal characteristics of technology. Our analysis showed that publications mentioning sustainability technologies for older adults have been growing progressively since the 2000s, but ...

En este artículo se presenta un método para acordar la semántica de los conceptos de los vocabula... more En este artículo se presenta un método para acordar la semántica de los conceptos de los vocabularios de metadatos. El análisis conceptual de mapeados y alineamientos de ontologías ha permitido diseñar un sistema para consensuar semántica de vocabularios de metadatos mediante una ontología de alto nivel. Para la definición del método se han seleccionado 6 vocabularios de metadatos y se han realizado pruebas de alineamiento manual y correspondencia conceptual con la ontología de referencia. PROTON ha sido la ontología de referencia seleccionada por adaptarse al dominio, extensibilidad, adecuación al sistema de recuperación y disponibilidad. El resultado del proceso es la formalización de los vocabularios y el enriquecimiento de la semántica. Estas características favorecen el uso y el funcionamiento de este tipo de recursos. La solución está basada en un sistema de formalización web basado en niveles semánticos, para evitar modificar el recurso original y facilitar la extensibilidad al reducir las dependencias entre los elementos. La principal aportación de este estudio es un método para la formalización y desambiguación de vocabularios de metadatos, donde pueden convivir e interoperar todos los elementos de los vocabularios sin tener que reetiquetar o desestimar por posibles usos minoritarios u obsolescencia de los vocabularios. Además, esta propuesta permite y facilita la reutilización de los vocabularios de metadatos, la interoperabilidad y recuperación conceptual.

Hindi is the most widely spoken language in India, with more than 300 million speakers. As there ... more Hindi is the most widely spoken language in India, with more than 300 million
speakers. As there is no separation between the characters of texts written in Hindi as
there is in English, the Optical Character Recognition (OCR) systems developed for the
Hindi language carry a very poor recognition rate. In this paper we propose an OCR for
printed Hindi text in Devanagari script, using Artificial Neural Network (ANN), which
improves its efficiency. One of the major reasons for the poor recognition rate is error in
character segmentation. The presence of touching characters in the scanned documents
further complicates the segmentation process, creating a major problem when designing
an effective character segmentation technique. Preprocessing, character segmentation,
feature extraction, and finally, classification and recognition are the major steps which are
followed by a general OCR.
The preprocessing tasks considered in the paper are conversion of gray scaled images to
binary images, image rectification, and segmentation of the document´s textual contents
into paragraphs, lines, words, and then at the level of basic symbols. The basic symbols,
obtained as the fundamental unit from the segmentation process, are recognized by the
neural classifier.
In this work, three feature extraction techniques-: histogram of projection based on mean
distance, histogram of projection based on pixel value, and vertical zero crossing, have
been used to improve the rate of recognition. These feature extraction techniques are
powerful enough to extract features of even distorted characters/symbols. For development
of the neural classifier, a back-propagation neural network with two hidden layers is used.
The classifier is trained and tested for printed Hindi texts. A performance of approximately
90% correct recognition rate is achieved.

Encontros Bibli Revista Eletronica De Biblioteconomia E Ciencia Da Informacao, 2012