Carmem Hara | Universidade Federal do Paraná (original) (raw)
Papers by Carmem Hara
Communications in computer and information science, 2020
Wireless Sensor Networks (WSN) in urban environments manage a large amount of sensoring data. The... more Wireless Sensor Networks (WSN) in urban environments manage a large amount of sensoring data. The deployment of spatial query processing in a decentralized and autonomous large-scale WSN is a major challenge due to the network resources constraints. This paper proposes ASQPM, a scalable and autonomous model for data storage and spatial query processing. Scalability is provided by grouping sensors into clusters based on the spatial similarity of their readings. The query processing efficiency relies on the concept of repositories, which are regions in the monitored area that concentrate information, storing the readings of a set of clusters. The experimental results show that it is more effective for query processing than classical approaches.
Lecture Notes in Computer Science, 2015
In this paper, we present an RDF data distribution approach which overcomes the shortcomings of t... more In this paper, we present an RDF data distribution approach which overcomes the shortcomings of the current solutions in order to scale RDF storage both with the volume of data and query requests. We apply a workload-aware method that identifies frequent patterns accessed by queries in order to keep related data in the same partition. In order to avoid exhaustive analysis on large datasets, a summarized view of the datasets is considered to deploy our reasoning through partitioning templates for data items in an RDF structure. An experimental study shows that our method scales well and is effective to improve the overall performance by decreasing the amount of message passing among servers, compared to alternative data distribution approaches for RDF.
Computer Networks, Aug 1, 2002
SBBD (Short Papers), 2015
The communication costs involved in retrieving distributed data in SPARQL queries have a big impa... more The communication costs involved in retrieving distributed data in SPARQL queries have a big impact on the system performance. In this paper, we define a parallel graph processing model that explores the existence of allocation patterns, which consist of information on how data has been distributed among servers. Based on this model, we define two types of communication schedules: get-frag and send-result. These strategies are of great interest to query optimizers for efficient query processing on distributed RDF stores. Resumo. Grande parte do custo envolvido no processamento distribuído de consultas SPARQL resulta do custo de comunicação para a obtenção dos dados envolvidos na consulta. Neste trabalhoé definido um modelo de exploração de grafos paralelo para consultas SPARQL que considera a existência de padrões de distribuição de dados. A partir deste modelo, são definidos dois modelos de escalonamento de comunicação entre servidores: get-frag e send-result. Estes modelos poderão ser explorados futuramente por um otimizador para a execução eficiente de consultas sobre bases RDF distribuídas.
The sensing of urban environments usually takes into account the deployment of a large number of ... more The sensing of urban environments usually takes into account the deployment of a large number of devices to measure their environmental attributes, such as temperature, pressure, humidity, luminosity and pollution. In such applications, nearby sensors usually produce similar readings due to their spatial and temporal correlation. In the era of big data, management of collected data requires autonomous and scalable Wireless Sensor Network (WSN) structures. In this paper, we propose an in-network data storage model, called AQPM, that provides efficient processing of both spatial and value-based queries. AQPM is autonomous and scalable. That is, it does not rely on any central entity for neither managing data storage on sensor devices nor for processing queries. Scalability is achieved by grouping sensors with similar readings into clusters, while efficient query processing relies on the concept of repositories. Repositories are sensors that store readings of a set of clusters, and are the only ones that have to be contacted for answering queries. AQPM has been implemented on NS2 simulator and experimental results show that it is more effective than existing approaches.
Distributed and Parallel Databases, May 16, 2020
The ever-increasing amount of RDF data made available requires data to be partitioned across mult... more The ever-increasing amount of RDF data made available requires data to be partitioned across multiple servers. We have witnessed some research progress made towards scaling RDF query processing based on suitable data distribution methods. In general, they work well for queries matching simple triple patterns, but they are not efficient for queries involving more complex patterns. In this paper, we present an RDF data distribution method which overcomes the shortcomings of the current approaches in order to scale RDF storage both on the volume of data and query processing. We apply a method that identifies frequent patterns accessed by queries in order to keep related data in the same partition. We deploy our reasoning on a summarized view of data in order to avoid exhaustive analysis on large datasets. As result, partitioning templates are obtained from data items in an RDF structure. In addition, we provide an approach for dynamic data insertions even if new data do not conform to the original RDF structure. Apart from the repartitioning approaches, we use an overflow repository to store data which may not follow the original schema. Our study shows that our method scales well and is effective to improve the overall performance by decreasing the amount of message passing among servers, compared to alternative data distribution approaches for RDF.
The assistance to people in emergency health situations before medical care can contribute to the... more The assistance to people in emergency health situations before medical care can contribute to their survival, such as in a respiratory arrest where response time is essential. In order to reduce the impact of the waiting time before an attendance, this paper presents the MobAngelo system, which connects mobile devices of people nearby so that, according to their health knowledge, they can assist each other in first aid until specialized care occurs. MobAngelo works in urban and sparse environments through the creation of temporary networks so that the user is aware of other nearby users who need help.
Communications in computer and information science, 2019
The dynamicity requirements of urban sensor networks rise new challenges to the development of da... more The dynamicity requirements of urban sensor networks rise new challenges to the development of data management and storage models. Software component techniques allow developers to build a software system from reusable, existing components sharing a common interface. Moreover, the development of urban sensor networks applications would greatly benefit from the existence of a dedicated programming environment. This paper proposes SLEDS, a Domain-Specific Language for Data-Centric Storage on Wireless Sensor Networks. The language includes high-level composition primitives, to promote a flexible coordination execution flow and interaction between components. We present the language specification as well as a case study of data storage coordination on sensor networks. The current specification of the language generates code for the NS2 simulation environment. The case study shows that the language implements a flexible model, which is general enough to be used on a wide variety of sensor network applications.
The huge volume of existing RDF datasets requires SPARQL queries to be efficiently processed. One... more The huge volume of existing RDF datasets requires SPARQL queries to be efficiently processed. One approach to achieve this goal is to store RDF on a group-by-entity relational database, which explores structural similarity to group sets of triples in a single line of a relation. In this paper, we propose a method for translating SPARQL queries to SQL to be processed on such a database. Our experiments showed that the execution time of the translated queries are in average 250% lower, compared to queries on a triples relation.
Information Systems, Dec 1, 2003
We study absolute and relative keys for XML, and investigate their associated decision problems. ... more We study absolute and relative keys for XML, and investigate their associated decision problems. We argue that these keys are important to many forms of hierarchically structured data including XML documents. In contrast to other proposals of keys for XML, we show that these keys are always (finitely) satisfiable, and their (finite) implication problem is finitely axiomatizable. Furthermore, we provide a polynomial time algorithm for determining (finite) implication in the size of keys. Our results also demonstrate, among other things, that the analysis of XML keys is far more intricate than its relational counterpart.
Information Systems, May 1, 2021
Wireless Sensor Networks (WSNs) have become an integral part of urban scenarios. They are usually... more Wireless Sensor Networks (WSNs) have become an integral part of urban scenarios. They are usually composed of a large number of devices. Developing systems for such networks is a hard task and often involves validation on simulation environments before deployment on real settings. Componentbased development allows systems to be built from reusable, existing components that share a common interface. This paper proposes a domain specific language (DSL) for coordination of WSN software components. The language provides high-level composition primitives to promote a flexible coordination execution flow and interaction between them. We present the language specification as well as a case study of an in-network WSN data storage coordination. The current specification of the language generates code for the NS2 simulation environment. The case study shows that the language implements a flexible development model. Moreover, we analyze the code reusability promoted by the language and show that it reduces the programming effort in a component-based development framework.
Livro de Memórias do IV SUSTENTARE e VII WIPIS: Workshop internancional de Sustentabilidade, Indicadores e Gestão de Recursos Hídricos
Anais do XXXIII Simpósio Brasileiro de Banco de Dados (SBBD 2018)
Diversas propostas utilizam Sistemas Gerenciadores de Bancos de Dados Relacionais (SGBDRs) para o... more Diversas propostas utilizam Sistemas Gerenciadores de Bancos de Dados Relacionais (SGBDRs) para o armazenamento de dados RDF. O mapeamento direto de RDF para uma tabela de triplas resulta em um desempenho ineficiente no processamento de consultas. Este artigo propõe AORR (Armazenamento Otimizado de dados RDF em SGBDR), um método que identifica entidades de dados para gerar tabelas. Além disto, AORR se diferencia de trabalhos relacionados por possibilitar a tradução de consultas SPARQL-SQL, bem como atualizações incrementais da base. Um estudo experimental mostrou que AORR apresenta desempenho superior em consultas, comparado a uma proposta alternativa que também adota o conceito de tabelas de entidades.
Anais do XXXVI Simpósio Brasileiro de Banco de Dados (SBBD 2021), 2021
Traffic events announcements such as jams and road closures are continuously reported by mobile a... more Traffic events announcements such as jams and road closures are continuously reported by mobile and Web applications. This collection of spatio-temporal data is an important source of information for urban planning, and can be used to orchestrate a number of actions to mprove the mobility, such as traffic control, traffic lights synchronization and preventive maintenance. Such analysis usually involves computation of spatial relationships among data, and may involve location of landmarks, roads and different types of events. In this paper, we propose a Method for Indexing Traffic Events (MIDET) for querying spatio-temporal data, whose location can be represented as a point or collection of points. MIDET is based on a fixed-grid space-oriented partitioning. In order to tackle the data skew, each grid cell is associated with a set of blocks containing event records. Moreover, a bitmap index is used for filtering out blocks without retrieving the actual data. MIDET provides the followi...
Communications in computer and information science, 2020
Wireless Sensor Networks (WSN) in urban environments manage a large amount of sensoring data. The... more Wireless Sensor Networks (WSN) in urban environments manage a large amount of sensoring data. The deployment of spatial query processing in a decentralized and autonomous large-scale WSN is a major challenge due to the network resources constraints. This paper proposes ASQPM, a scalable and autonomous model for data storage and spatial query processing. Scalability is provided by grouping sensors into clusters based on the spatial similarity of their readings. The query processing efficiency relies on the concept of repositories, which are regions in the monitored area that concentrate information, storing the readings of a set of clusters. The experimental results show that it is more effective for query processing than classical approaches.
Lecture Notes in Computer Science, 2015
In this paper, we present an RDF data distribution approach which overcomes the shortcomings of t... more In this paper, we present an RDF data distribution approach which overcomes the shortcomings of the current solutions in order to scale RDF storage both with the volume of data and query requests. We apply a workload-aware method that identifies frequent patterns accessed by queries in order to keep related data in the same partition. In order to avoid exhaustive analysis on large datasets, a summarized view of the datasets is considered to deploy our reasoning through partitioning templates for data items in an RDF structure. An experimental study shows that our method scales well and is effective to improve the overall performance by decreasing the amount of message passing among servers, compared to alternative data distribution approaches for RDF.
Computer Networks, Aug 1, 2002
SBBD (Short Papers), 2015
The communication costs involved in retrieving distributed data in SPARQL queries have a big impa... more The communication costs involved in retrieving distributed data in SPARQL queries have a big impact on the system performance. In this paper, we define a parallel graph processing model that explores the existence of allocation patterns, which consist of information on how data has been distributed among servers. Based on this model, we define two types of communication schedules: get-frag and send-result. These strategies are of great interest to query optimizers for efficient query processing on distributed RDF stores. Resumo. Grande parte do custo envolvido no processamento distribuído de consultas SPARQL resulta do custo de comunicação para a obtenção dos dados envolvidos na consulta. Neste trabalhoé definido um modelo de exploração de grafos paralelo para consultas SPARQL que considera a existência de padrões de distribuição de dados. A partir deste modelo, são definidos dois modelos de escalonamento de comunicação entre servidores: get-frag e send-result. Estes modelos poderão ser explorados futuramente por um otimizador para a execução eficiente de consultas sobre bases RDF distribuídas.
The sensing of urban environments usually takes into account the deployment of a large number of ... more The sensing of urban environments usually takes into account the deployment of a large number of devices to measure their environmental attributes, such as temperature, pressure, humidity, luminosity and pollution. In such applications, nearby sensors usually produce similar readings due to their spatial and temporal correlation. In the era of big data, management of collected data requires autonomous and scalable Wireless Sensor Network (WSN) structures. In this paper, we propose an in-network data storage model, called AQPM, that provides efficient processing of both spatial and value-based queries. AQPM is autonomous and scalable. That is, it does not rely on any central entity for neither managing data storage on sensor devices nor for processing queries. Scalability is achieved by grouping sensors with similar readings into clusters, while efficient query processing relies on the concept of repositories. Repositories are sensors that store readings of a set of clusters, and are the only ones that have to be contacted for answering queries. AQPM has been implemented on NS2 simulator and experimental results show that it is more effective than existing approaches.
Distributed and Parallel Databases, May 16, 2020
The ever-increasing amount of RDF data made available requires data to be partitioned across mult... more The ever-increasing amount of RDF data made available requires data to be partitioned across multiple servers. We have witnessed some research progress made towards scaling RDF query processing based on suitable data distribution methods. In general, they work well for queries matching simple triple patterns, but they are not efficient for queries involving more complex patterns. In this paper, we present an RDF data distribution method which overcomes the shortcomings of the current approaches in order to scale RDF storage both on the volume of data and query processing. We apply a method that identifies frequent patterns accessed by queries in order to keep related data in the same partition. We deploy our reasoning on a summarized view of data in order to avoid exhaustive analysis on large datasets. As result, partitioning templates are obtained from data items in an RDF structure. In addition, we provide an approach for dynamic data insertions even if new data do not conform to the original RDF structure. Apart from the repartitioning approaches, we use an overflow repository to store data which may not follow the original schema. Our study shows that our method scales well and is effective to improve the overall performance by decreasing the amount of message passing among servers, compared to alternative data distribution approaches for RDF.
The assistance to people in emergency health situations before medical care can contribute to the... more The assistance to people in emergency health situations before medical care can contribute to their survival, such as in a respiratory arrest where response time is essential. In order to reduce the impact of the waiting time before an attendance, this paper presents the MobAngelo system, which connects mobile devices of people nearby so that, according to their health knowledge, they can assist each other in first aid until specialized care occurs. MobAngelo works in urban and sparse environments through the creation of temporary networks so that the user is aware of other nearby users who need help.
Communications in computer and information science, 2019
The dynamicity requirements of urban sensor networks rise new challenges to the development of da... more The dynamicity requirements of urban sensor networks rise new challenges to the development of data management and storage models. Software component techniques allow developers to build a software system from reusable, existing components sharing a common interface. Moreover, the development of urban sensor networks applications would greatly benefit from the existence of a dedicated programming environment. This paper proposes SLEDS, a Domain-Specific Language for Data-Centric Storage on Wireless Sensor Networks. The language includes high-level composition primitives, to promote a flexible coordination execution flow and interaction between components. We present the language specification as well as a case study of data storage coordination on sensor networks. The current specification of the language generates code for the NS2 simulation environment. The case study shows that the language implements a flexible model, which is general enough to be used on a wide variety of sensor network applications.
The huge volume of existing RDF datasets requires SPARQL queries to be efficiently processed. One... more The huge volume of existing RDF datasets requires SPARQL queries to be efficiently processed. One approach to achieve this goal is to store RDF on a group-by-entity relational database, which explores structural similarity to group sets of triples in a single line of a relation. In this paper, we propose a method for translating SPARQL queries to SQL to be processed on such a database. Our experiments showed that the execution time of the translated queries are in average 250% lower, compared to queries on a triples relation.
Information Systems, Dec 1, 2003
We study absolute and relative keys for XML, and investigate their associated decision problems. ... more We study absolute and relative keys for XML, and investigate their associated decision problems. We argue that these keys are important to many forms of hierarchically structured data including XML documents. In contrast to other proposals of keys for XML, we show that these keys are always (finitely) satisfiable, and their (finite) implication problem is finitely axiomatizable. Furthermore, we provide a polynomial time algorithm for determining (finite) implication in the size of keys. Our results also demonstrate, among other things, that the analysis of XML keys is far more intricate than its relational counterpart.
Information Systems, May 1, 2021
Wireless Sensor Networks (WSNs) have become an integral part of urban scenarios. They are usually... more Wireless Sensor Networks (WSNs) have become an integral part of urban scenarios. They are usually composed of a large number of devices. Developing systems for such networks is a hard task and often involves validation on simulation environments before deployment on real settings. Componentbased development allows systems to be built from reusable, existing components that share a common interface. This paper proposes a domain specific language (DSL) for coordination of WSN software components. The language provides high-level composition primitives to promote a flexible coordination execution flow and interaction between them. We present the language specification as well as a case study of an in-network WSN data storage coordination. The current specification of the language generates code for the NS2 simulation environment. The case study shows that the language implements a flexible development model. Moreover, we analyze the code reusability promoted by the language and show that it reduces the programming effort in a component-based development framework.
Livro de Memórias do IV SUSTENTARE e VII WIPIS: Workshop internancional de Sustentabilidade, Indicadores e Gestão de Recursos Hídricos
Anais do XXXIII Simpósio Brasileiro de Banco de Dados (SBBD 2018)
Diversas propostas utilizam Sistemas Gerenciadores de Bancos de Dados Relacionais (SGBDRs) para o... more Diversas propostas utilizam Sistemas Gerenciadores de Bancos de Dados Relacionais (SGBDRs) para o armazenamento de dados RDF. O mapeamento direto de RDF para uma tabela de triplas resulta em um desempenho ineficiente no processamento de consultas. Este artigo propõe AORR (Armazenamento Otimizado de dados RDF em SGBDR), um método que identifica entidades de dados para gerar tabelas. Além disto, AORR se diferencia de trabalhos relacionados por possibilitar a tradução de consultas SPARQL-SQL, bem como atualizações incrementais da base. Um estudo experimental mostrou que AORR apresenta desempenho superior em consultas, comparado a uma proposta alternativa que também adota o conceito de tabelas de entidades.
Anais do XXXVI Simpósio Brasileiro de Banco de Dados (SBBD 2021), 2021
Traffic events announcements such as jams and road closures are continuously reported by mobile a... more Traffic events announcements such as jams and road closures are continuously reported by mobile and Web applications. This collection of spatio-temporal data is an important source of information for urban planning, and can be used to orchestrate a number of actions to mprove the mobility, such as traffic control, traffic lights synchronization and preventive maintenance. Such analysis usually involves computation of spatial relationships among data, and may involve location of landmarks, roads and different types of events. In this paper, we propose a Method for Indexing Traffic Events (MIDET) for querying spatio-temporal data, whose location can be represented as a point or collection of points. MIDET is based on a fixed-grid space-oriented partitioning. In order to tackle the data skew, each grid cell is associated with a set of blocks containing event records. Moreover, a bitmap index is used for filtering out blocks without retrieving the actual data. MIDET provides the followi...