Carmem Hara - Profile on Academia.edu (original) (raw)

Papers by Carmem Hara

Communications in computer and information science, 2020

An Autonomic Spatial Query Processing Model for Urban Sensor Networks

Wireless Sensor Networks (WSN) in urban environments manage a large amount of sensoring data. The... more Wireless Sensor Networks (WSN) in urban environments manage a large amount of sensoring data. The deployment of spatial query processing in a decentralized and autonomous large-scale WSN is a major challenge due to the network resources constraints. This paper proposes ASQPM, a scalable and autonomous model for data storage and spatial query processing. Scalability is provided by grouping sensors into clusters based on the spatial similarity of their readings. The query processing efficiency relies on the concept of repositories, which are regions in the monitored area that concentrate information, storing the readings of a set of clusters. The experimental results show that it is more effective for query processing than classical approaches.

Lecture Notes in Computer Science, 2015

In this paper, we present an RDF data distribution approach which overcomes the shortcomings of t... more In this paper, we present an RDF data distribution approach which overcomes the shortcomings of the current solutions in order to scale RDF storage both with the volume of data and query requests. We apply a workload-aware method that identifies frequent patterns accessed by queries in order to keep related data in the same partition. In order to avoid exhaustive analysis on large datasets, a summarized view of the datasets is considered to deploy our reasoning through partitioning templates for data items in an RDF structure. An experimental study shows that our method scales well and is effective to improve the overall performance by decreasing the amount of message passing among servers, compared to alternative data distribution approaches for RDF.

SBBD (Short Papers), 2015

The communication costs involved in retrieving distributed data in SPARQL queries have a big impa... more The communication costs involved in retrieving distributed data in SPARQL queries have a big impact on the system performance. In this paper, we define a parallel graph processing model that explores the existence of allocation patterns, which consist of information on how data has been distributed among servers. Based on this model, we define two types of communication schedules: get-frag and send-result. These strategies are of great interest to query optimizers for efficient query processing on distributed RDF stores. Resumo. Grande parte do custo envolvido no processamento distribuído de consultas SPARQL resulta do custo de comunicação para a obtenção dos dados envolvidos na consulta. Neste trabalhoé definido um modelo de exploração de grafos paralelo para consultas SPARQL que considera a existência de padrões de distribuição de dados. A partir deste modelo, são definidos dois modelos de escalonamento de comunicação entre servidores: get-frag e send-result. Estes modelos poderão ser explorados futuramente por um otimizador para a execução eficiente de consultas sobre bases RDF distribuídas.

The sensing of urban environments usually takes into account the deployment of a large number of ... more The sensing of urban environments usually takes into account the deployment of a large number of devices to measure their environmental attributes, such as temperature, pressure, humidity, luminosity and pollution. In such applications, nearby sensors usually produce similar readings due to their spatial and temporal correlation. In the era of big data, management of collected data requires autonomous and scalable Wireless Sensor Network (WSN) structures. In this paper, we propose an in-network data storage model, called AQPM, that provides efficient processing of both spatial and value-based queries. AQPM is autonomous and scalable. That is, it does not rely on any central entity for neither managing data storage on sensor devices nor for processing queries. Scalability is achieved by grouping sensors with similar readings into clusters, while efficient query processing relies on the concept of repositories. Repositories are sensors that store readings of a set of clusters, and are the only ones that have to be contacted for answering queries. AQPM has been implemented on NS2 simulator and experimental results show that it is more effective than existing approaches.

Distributed and Parallel Databases, May 16, 2020

The ever-increasing amount of RDF data made available requires data to be partitioned across mult... more The ever-increasing amount of RDF data made available requires data to be partitioned across multiple servers. We have witnessed some research progress made towards scaling RDF query processing based on suitable data distribution methods. In general, they work well for queries matching simple triple patterns, but they are not efficient for queries involving more complex patterns. In this paper, we present an RDF data distribution method which overcomes the shortcomings of the current approaches in order to scale RDF storage both on the volume of data and query processing. We apply a method that identifies frequent patterns accessed by queries in order to keep related data in the same partition. We deploy our reasoning on a summarized view of data in order to avoid exhaustive analysis on large datasets. As result, partitioning templates are obtained from data items in an RDF structure. In addition, we provide an approach for dynamic data insertions even if new data do not conform to the original RDF structure. Apart from the repartitioning approaches, we use an overflow repository to store data which may not follow the original schema. Our study shows that our method scales well and is effective to improve the overall performance by decreasing the amount of message passing among servers, compared to alternative data distribution approaches for RDF.

The assistance to people in emergency health situations before medical care can contribute to the... more The assistance to people in emergency health situations before medical care can contribute to their survival, such as in a respiratory arrest where response time is essential. In order to reduce the impact of the waiting time before an attendance, this paper presents the MobAngelo system, which connects mobile devices of people nearby so that, according to their health knowledge, they can assist each other in first aid until specialized care occurs. MobAngelo works in urban and sparse environments through the creation of temporary networks so that the user is aware of other nearby users who need help.

Communications in computer and information science, 2019

The dynamicity requirements of urban sensor networks rise new challenges to the development of da... more The dynamicity requirements of urban sensor networks rise new challenges to the development of data management and storage models. Software component techniques allow developers to build a software system from reusable, existing components sharing a common interface. Moreover, the development of urban sensor networks applications would greatly benefit from the existence of a dedicated programming environment. This paper proposes SLEDS, a Domain-Specific Language for Data-Centric Storage on Wireless Sensor Networks. The language includes high-level composition primitives, to promote a flexible coordination execution flow and interaction between components. We present the language specification as well as a case study of data storage coordination on sensor networks. The current specification of the language generates code for the NS2 simulation environment. The case study shows that the language implements a flexible model, which is general enough to be used on a wide variety of sensor network applications.

Resumo. As necessidades de conhecimento da biodiversidade são constantes, enquanto recursos para ... more Resumo. As necessidades de conhecimento da biodiversidade são constantes, enquanto recursos para pesquisa, sejam financeiros, de tempo e humanos são escassos. Por outro lado, a Internet oferece um enorme volume de dados que podem ser explorados em favor da ciência da conservac ¸ão. As caravelasportuguesas (Physalia physalis) oferecem risco à populac ¸ão, e dados sobre sua ocorrência nem sempre estão disponíveis para estudo da espécie. Neste trabalho é proposto o treinamento de modelos de aprendizagem de máquina como ferramenta para classificar dados extraídos de uma mídia social e assim possibilitar a gerac ¸ão de uma base de dados sobre ocorrências de caravelasportuguesas no litoral brasileiro.

The huge volume of existing RDF datasets requires SPARQL queries to be efficiently processed. One... more The huge volume of existing RDF datasets requires SPARQL queries to be efficiently processed. One approach to achieve this goal is to store RDF on a group-by-entity relational database, which explores structural similarity to group sets of triples in a single line of a relation. In this paper, we propose a method for translating SPARQL queries to SQL to be processed on such a database. Our experiments showed that the execution time of the translated queries are in average 250% lower, compared to queries on a triples relation.

Information Systems, Dec 1, 2003

We study absolute and relative keys for XML, and investigate their associated decision problems. ... more We study absolute and relative keys for XML, and investigate their associated decision problems. We argue that these keys are important to many forms of hierarchically structured data including XML documents. In contrast to other proposals of keys for XML, we show that these keys are always (finitely) satisfiable, and their (finite) implication problem is finitely axiomatizable. Furthermore, we provide a polynomial time algorithm for determining (finite) implication in the size of keys. Our results also demonstrate, among other things, that the analysis of XML keys is far more intricate than its relational counterpart.

Sensor networks are a fast-evolving technology used for a variety of applications, ranging from e... more Sensor networks are a fast-evolving technology used for a variety of applications, ranging from environmental monitoring to cyberphysical systems (CPS) and IoT, including applications designed to support smart cities. The widespread use of sensor networks rises new challenges to data management and storage. The development of data storage systems is a hard task due to the speci c nature of wireless sensor networks (WSNs) and the lack of a common general purpose development framework. Software component models provide an appropriate level of system abstraction, reducing the development complexity and improving productivity. In this paper we propose RCBM, a Reusable Component-based Model for wireless sensor network storage simulation. RCBM promotes software reuse from existing components to improve the e ciency of system development and evaluation. RCBM has been implemented on the NS2 simulator and experimental results show that RCBM is more exible than previous component-based models for WSNs. Due to its general-purpose approach, RCBM can be applied to develop simulation code for a wide range of WSN storage models, reducing the development e ort.

Information Systems, May 1, 2021

Wireless Sensor Networks (WSNs) have become an integral part of urban scenarios. They are usually... more Wireless Sensor Networks (WSNs) have become an integral part of urban scenarios. They are usually composed of a large number of devices. Developing systems for such networks is a hard task and often involves validation on simulation environments before deployment on real settings. Componentbased development allows systems to be built from reusable, existing components that share a common interface. This paper proposes a domain specific language (DSL) for coordination of WSN software components. The language provides high-level composition primitives to promote a flexible coordination execution flow and interaction between them. We present the language specification as well as a case study of an in-network WSN data storage coordination. The current specification of the language generates code for the NS2 simulation environment. The case study shows that the language implements a flexible development model. Moreover, we analyze the code reusability promoted by the language and show that it reduces the programming effort in a component-based development framework.

Anais do XXXIII Simpósio Brasileiro de Banco de Dados (SBBD 2018)

Diversas propostas utilizam Sistemas Gerenciadores de Bancos de Dados Relacionais (SGBDRs) para o... more Diversas propostas utilizam Sistemas Gerenciadores de Bancos de Dados Relacionais (SGBDRs) para o armazenamento de dados RDF. O mapeamento direto de RDF para uma tabela de triplas resulta em um desempenho ineficiente no processamento de consultas. Este artigo propõe AORR (Armazenamento Otimizado de dados RDF em SGBDR), um método que identifica entidades de dados para gerar tabelas. Além disto, AORR se diferencia de trabalhos relacionados por possibilitar a tradução de consultas SPARQL-SQL, bem como atualizações incrementais da base. Um estudo experimental mostrou que AORR apresenta desempenho superior em consultas, comparado a uma proposta alternativa que também adota o conceito de tabelas de entidades.

Anais do XXXVI Simpósio Brasileiro de Banco de Dados (SBBD 2021), 2021

Traffic events announcements such as jams and road closures are continuously reported by mobile a... more Traffic events announcements such as jams and road closures are continuously reported by mobile and Web applications. This collection of spatio-temporal data is an important source of information for urban planning, and can be used to orchestrate a number of actions to mprove the mobility, such as traffic control, traffic lights synchronization and preventive maintenance. Such analysis usually involves computation of spatial relationships among data, and may involve location of landmarks, roads and different types of events. In this paper, we propose a Method for Indexing Traffic Events (MIDET) for querying spatio-temporal data, whose location can be represented as a point or collection of points. MIDET is based on a fixed-grid space-oriented partitioning. In order to tackle the data skew, each grid cell is associated with a set of blocks containing event records. Moreover, a bitmap index is used for filtering out blocks without retrieving the actual data. MIDET provides the followi...

Social media in service of marine ecology: new observations of the ghost crab Ocypode quadrata (Fabricius, 1787) scavenging on Portuguese man-of-war Physalia physalis (Linnaeus, 1758)

Aquatic Ecology, 2022

Anais Estendidos do XXXVI Simpósio Brasileiro de Banco de Dados (SBBD Estendido 2021), 2021

The COVID-19 pandemic created new demands for services in the judicial system, requiring the use ... more The COVID-19 pandemic created new demands for services in the judicial system, requiring the use of a data warehouse (DW). Although there exist approaches that use DW in the judicial domain, few target the pandemic or publicly provide the information extracted from the texts. Following the needs of a legal expert, we have developed the COVID-19 Portal. It extracts documents from the Supreme Federal Court in Brazil to obtain quantitative information on words used in the texts. In this paper, we present the design of a DW, and show the query performance improvement achieved with its implementation. The DW has been developed on Postgres, and its performance is compared with the original implementation on MongoDB Cloud and a local MongoDB database.

Development of Wireless Sensor Networks Applications with State-based Orchestration