Grid-Based Clustering of Waze Data on a Relational Database (original) (raw)

An extensible index for spatial databases

2001

Emerging database applications require the use of new indexing structures beyond B-trees and R-trees. Examples are the k-D tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SP-GiST, is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SP-GiST can behave as a k-D tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SP-GiST are addressed. A dynamic minimum-height clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SP-GiST is presented as well as performance studies of the various SP-GiST's tuning parameters.

Secondo-grid: an Infrastructure to Study Spatial Databases in Computational Grids

This article describes a framework designed to be used as a platform for experiments of distributed spatial databases in a computer grid. The environment consists of an open database modified to send and receive jobs to others databases using a grid middleware and its services. With this framework resources can be discovered and monitored, for example, to help in the process of constructing a query plan in a distributed environment. A case study based in a geographic database is being used to validate the framework.

IGSIM: An Integrated Architecture for High Performance Spatial Data Analysis

Effective data processing technologies exist for distributed query and analysis operation on distributed spatial data. Emergence of grid computing technologies has contributed through its computing power. MapReduce has been proven quite efficient for data analytic applications for data-intensive tasks. This paper presents a novel architecture, Integrated Grid and Spatially Indexed MapReduce (IGSIM) that integrates Grid and MapReduce technologies for fast spatial data analysis. The architecture provides an improved fault tolerance and better efficiency for the execution time of spatial queries. To further improve the organization of the spatial data and consequently the efficiency of spatial queries, spatial indexes are used in the architecture. The IGSIM architecture with a parallel implementation of the R-Tree and R+-Tree in SpatialHadoop framework provides high efficiency for line and range search queries. The experimental results demonstrate high efficiency of the proposed architecture.

Towards parallel spatial query processing for big spatial data

2012

Abstract—In recent years, spatial applications have become more and more important in both scientific research and in-dustry. Spatial query processing is the fundamental functioning component to support spatial applications. However, the state-of-the-art techniques of spatial query processing are facing significant challenges as the data expand and user accesses increase. In this paper we propose and implement a novel scheme (named VegaGiStore) to provide efficient spatial query processing over big spatial data and numerous concurrent user queries. Firstly, a geography-aware approach is proposed to organize spatial data in terms of geographic proximity, and this approach can achieve high aggregate I/O throughput. Secondly, in order to improve data retrieval efficiency, we design a two-tier distributed spatial index for efficient pruning of the search space. Thirdly, we propose an “indexing + MapReduce ” data processing architecture to improve the computation capability of spatial qu...

Spatial data warehouses and spatial OLAP come towards the cloud: design and performance

Distributed and Parallel Databases, 2015

Cloud computing systems handle large volumes of data by using almost unlimited computational resources, while spatial data warehouses (SDWs) are multidimensional databases that store huge volumes of both spatial data and conventional data. Cloud computing environments have been considered adequate to host voluminous databases, process analytical workloads and deliver database as a service, while spatial online analytical processing (spatial OLAP) queries issued over SDWs are intrinsically analytical. However, hosting a SDW in the cloud and processing spatial B Valéria Cesário Times

Spatial Aggregation: Data Model and Implementation

Computing Research Repository, 2007

Data aggregation in Geographic Information Systems (GIS) is a desirable feature, only marginally present in commercial systems nowadays, mostly through ad-hoc solutions. Moreover, little attention has been given to the problem of integrating GIS and OLAP (On Line Analytical Processing) applications. In this paper, we first present a formal model for representing spatial data. This model integrates in a natural way geographic data and information contained in data warehouses external to the GIS. This novel approach allows both aggregation of geometric components and aggregation of measures associated to those components, defined in GIS fact tables. We define the notion of geometric aggregation, a general framework for aggregate queries in a GIS setting. Although general enough for expressing a wide range of queries, some of these queries can be hard to compute in a real-world GIS environment. Thus, we identify the class of summable queries, which can be efficiently evaluated by precomputing the overlay of two or more of the thematic layers involved in the query. We also sketch a language, denoted GISOLAP-QL, for expressing queries that involve GIS and OLAP features. In addition, we introduce Piet, an implementation of our proposal, that makes use of overlay precomputation for answering spatial queries (aggregate or not). Piet supports four kinds of queries: standard GIS queries, standard OLAP queries, geometric aggregation queries (like "total population in states with more than three airports"), and integrated GIS-OLAP queries ("total sales by product in cities crossed by a river", with the possibility of further navigating the results). Our experimental evaluation, discussed in the paper, showed that for a certain class of geometric queries with or without aggregation, overlay precomputation outperforms R-tree-based techniques. This suggests that overlay precomputation can be an alternative to be considered in GIS query processing engines. Finally, as a particular application of our proposal, we study topological queries.

A Survey of Traditional and MapReduce-Based Spatial Query Processing Approaches

Various indexing methods of spatial data have come out after rigorous e↵orts put by many researchers for fast processing of spatial queries. Parallelizing spatial index building and query processing have become very popular for improving eciency. The MapReduce framework provides a modern way of parallel processing. A MapReduce-based works for spatial queries consider the existing traditional spatial indexing for building spatial indexes in parallel. The majority of the spatial indexes implemented in MapReduce use R-Tree and its variants. Therefore, R-Tree and its variant-based traditional spatial indexes are thoroughly surveyed in the paper. The objective is to search for still less explored spatial indexing approaches, having the potential for par-allelism in MapReduce. The review work also provides a detailed survey of MapReduce-based spatial query processing approaches-hierarchical indexed and packed key-value storage based spatial dataset. Both approaches use di↵er-ent data partitioning strategies for distributing data among cluster nodes and managing the partitioned dataset through di↵erent indexing. Finally, a number of parameters are selected for comparison and analysis of all the existing approaches in the literature.

High performance spatial queries for spatial big data

SIGSPATIAL Special, 2015

Support of high performance queries on large volumes of spatial data has become increasingly important in many application domains, including geospatial problems in numerous disciplines, location based services, and emerging medical imaging applications. There are two major challenges for managing massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. Our goal is to develop a general framework to support high performance spatial queries and analytics for spatial big data on MapReduce and CPU-GPU hybrid platforms. In this paper, we introduce Hadoop-GIS -- a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through skew-aware spatial partitioning, on-demand indexing, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effectiv...

High performance computing with geographical data

Parallel Computing, 2003

The editors seek to accomplish two goals with this special edition. First, we wish to highlight research in High performance computation with geographical data on parallel and distributed architectures, and to this end we collate a range of papers exemplifying current activity. Second, we seek to raise awareness of the scope for further activity, including little-explored opportunities for multidisciplinary research and development. Motivation for this edition came from a mini-symposium at the Parallel Computing 2001 Conference in Naples. Before considering high performance computation with geographical data, it is necessary to consider aspects of geography. Geography comprises an astonishingly diverse spectrum: it is about the processes that form the Earth's surface and atmosphere and it is about human life and society. It can claim within its domain anything and everything that happens as a consequence of location, or requires location relative to the Earth's surface for description and understanding. Geography must embrace and often fuse physics, politics, geology, sociology, chemistry, economics, environmental science, biology, meteorology, ecology.. . It takes advantage of both non-quantitative thinking and mathematical modelling. In the light of the extraordinary challenges of the 21st century its importance is unparalleled-as is much of its supporting computer technology. The application of computers in geography dates back more than 40 years [1]. Software fundamental to geographical applications was packaged in Geographical Information Systems (GIS); these performed basic tasks to allow the creation, storage, processing and display of digital geographical data. GIS now permeate commerce, government and defence, as well as research, with application areas including public utilities and services, health, homeland security, the optimisation of marketing and placement of facilities, management of distribution networks, communications, environmental monitoring, management and policy-making. As digital datasets and commercial GIS products developed so the market for geospatial data and software grew to be measured in billions of dollars annually. Meanwhile the research focus was diverging, first to further develop the underlying theories of spatial data, second to integrate GIS with adjacent technologies including remote sensing and environmental modelling, and third to study the impact of the technology on