Efficient processing of spatial joins with DOT-based indexing (original) (raw)

The design and implementation of seeded trees: an efficient method for spatial joins

IEEE Transactions on Knowledge and Data Engineering, 1998

Existing methods for spatial joins require pre-existing spatial indices or other precomputation, but such approaches are inefficient and limited in generality. Operand data sets of spatial joins may not all have precomputed indices, particularly when they are dynamically generated by other selection or join operations. Also, existing spatial indices are mostly designed for spatial selections, and are not always efficient for joins. This paper explores the design and implementation of seeded trees [1], which are effective for spatial joins and efficient to construct at join time. Seeded trees are R-tree-like structures, but divided into seed levels and grown levels. This structure facilitates using information regarding the join to accelerate the join process, and allows efficient buffer management. In addition to the basic structure and behavior of seeded trees, we present techniques for efficient seeded tree construction, a new buffer management strategy to lower I/O costs, and theoretical analysis for choosing algorithmic parameters. We also present methods for reducing space requirements and improving the stability of seeded tree performance with no additional I/O costs. Our performance studies show that the seeded tree method outperforms other tree-based methods by far both in terms of the number disk pages accessed and weighted I/O costs. Further, its performance gain is stable across different input data, and its incurred CPU penalties are also lower.

An Efficient Cost Model for Spatial Joins Using R-trees

IEEE Transactions on Knowledge and Data Engineering, 1997

Spatial join is one of the fundamental operations in a Spatial Data Base Management System. Recently, the family of R-tree-based data structures has been adopted to support the execution of spatial joins. This paper introduces an analytical model that efficiently estimates the cost (in terms of disk accesses) of a spatial join query between two spatial datasets. The proposed model is based on an analytical formula that estimates the cost of the range query using Rtrees. In addition, comparison results are presented which show the accuracy of the analytical estimations when compared to actual tests on both synthetic and real datasets. It turns out that the relative error rarely exceeds 15% for all combinations.

Cost models for join queries in spatial databases

Proceedings 14th International Conference on Data Engineering, 1998

The join query is one of the fundamental operations in Data Base Management Systems (DBMSs). Modern DBMSs should be able to support non-traditional data, including spatial objects, in an efficient manner. Towards this goal, spatial data structures can be adopted in order to support the execution of join queries on sets of multidimensional data. This paper introduces analytical models that estimate the cost (in terms of node or disk accesses) of join queries involving two multidimensional indexed data sets using R-tree-based structures. In addition, experimental results are presented, which show the accuracy of the analytical estimations when compared to actual runs on both synthetic and real data sets. It turns out that the relative error rarely exceeds 15% for all combinations, a fact that makes the proposed cost models useful tools for efficient spatial query optimization.

Processing and optimization of multiway spatial joins using R-trees

1999

One of the most important types of query processing in spatial databases and geographic information systems is the spatial join, an operation that selects, from two relations, all object pairs satisfying some spatial predicate. A multiway join combines data originated from more than two relations. Although several techniques have been proposed for pairwise spatial joins, only limited work has focused on multiway spatial join processing. This paper solves multiway spatial joins by applying systematic search algorithms that exploit R-trees to efficiently guide search, without building temporary indexes or materializing intermediate results. In addition to general methodologies, we propose cost models and an optimization algorithm, and evaluate them through extensive experimentation.

Spatial joins using seeded trees

ACM SIGMOD Record, 1994

Existing methods for spatial joins assume the existence of indices for the participating data sets. This assumption is not realistic for applications involving multiple map layer overlays or for queries involving non-spatial selections. In this paper, we explore a spatiaJ join method that dynamically constructs index trees called seeded trees at join time. This method uses knowledge of thedata sets involved in the join to speed up the join process. Seeded trees are R-tree-like structures, and are divided into the seed levels and the grown levels. The nodes in the seed levels are used to guide tree growth during tree construction. The seed levels can also be used to filter out some input data during construction, thereby reducing tree size. We develop a technique that uses intermediate linked lists during tree construction and significantly speeds up the tree construction process. The technique allows a large number of random disk accesses during tree construction to be replaced by smsller numbers of sequential accesses. Our performance studies show that spatial joins using seeded trees outperform those using other methods significantly in terms of disk 1/0. The CPU penalties incurred are also lower except when seed-level filtering is used.

Pipelined spatial join processing for quadtree-based indexes

Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems - GIS '07, 2007

Spatial join is an important yet costly operation in spatial databases. In order to speed up the execution of a spatial join, the input tables are often indexed based on their spatial attributes. The quadtree index structure is a well-known index for organizing spatial database objects. It has been implemented in several database management systems, e.g., in Oracle Spatial and in PostgreSQL (via SP-GiST). Queries typically involve multiple pipelined spatial join operators that fit together in a query evaluation plan. In order to extend the applicability of these spatial joins, they are optimized so that upon receiving sorted input, they produce sorted output for the spatial join operators in the upperlevels of the query evaluation pipeline. This paper investigates the use of quadtree-based spatial join algorithms and how they can be adapted to answer queries that involve multiple pipelined spatial joins in a query evaluation plan. The paper investigates several adaptations to pipelined spatial join algorithms and their performance for the cases when both input tables are indexed, when only one of the tables is indexed while the second table is sorted, and when both tables are sorted but are not indexed.

Efficient join-index-based spatial-join processing: A clustering approach

2002

A join-index is a data structure used for processing join queries in databases. Join-indices use precomputation techniques to speed up online query processing and are useful for data sets which are updated infrequently. The I/O cost of join computation using a join-index with limited buffer space depends primarily on the page-access sequence used to fetch the pages of the base relations. Given a join-index, we introduce a suite of methods based on clustering to compute the joins. We derive upper bounds on the length of the page-access sequences. Experimental results with Sequoia 2000 data sets show that the clustering method outperforms existing methods based on sorting and online-clustering heuristics.

A Rule-Based Optimizer for Spatial Join Algorithms

2008

Abstract. The spatial join operation is both one of the most important and expensive operations in Geographic Database Management Systems (GDBMS). This paper presents a set of rules to optimize the performance of the filtering step of spatial joins operations. First, a set of expressions to predict the number of I/O operations and CPU performance is presented. The rules are based on expressions to predict the performance of algorithms and tests performed with synthetic and real data sets.

mqr-tree: A 2-dimensional Spatial Access Method

In this paper, we propose the mqr-tree, a two-dimensional spatial access method that organizes spatial objects in a two-dimensional node and based on their spatial relationships. Previously proposed spatial access methods that attempt to maintain spatial relationships between objects in their structures are limited in their incorporation of existing one-dimensional spatial access methods, or have lower space utilization in its nodes, and higher tree height, overcoverage and overlap than is necessary. The mqr-tree utilizes a node organization, set of spatial relationship rules and insertion strategy in order to gain significant improvements in overlap and overcoverage. In addition, other desirable properties are identified as a result of the chosen node organization and insertion strategies. In particular, zero overlap is achieved when the mqr-tree is used to index point data. A comparison of the mqr-tree insertion strategy versus the R-tree shows significant improvements in overlap and overcoverage, with comparable space utilization. In addition, a comparison of region searching shows that the mqr-tree achieves a lower number of disk accesses in many cases.

A Survey on Spatial Indexing

2018

Spatial information processing has been a centre of attention of research in the previous decade. In spatial databases, data related with spatial coordinates and extents are retrieved based on spatial proximity. A large number of spatial indexes have been proposed to make ease of efficient indexing of spatial objects in large databases and spatial data retrieval. The goal of this paper is to review the advance techniques of the access methods. This paper tries to classify the existing multidimensional access methods, according to the types of indexing, and their performance over spatial queries. K-d trees out performs quad tress without requiring additional memory usage.