Overlap interval partition join (original) (raw)
Related papers
Object-Relational Indexing for General Interval Relationships
Lecture Notes in Computer Science, 2001
Intervals represent a fundamental data type for temporal, scientific, and spatial databases where time stamps and point data are extended to time spans and range data, respectively. For OLTP and OLAP applications on large amounts of data, not only intersection queries have to be processed efficiently but also general interval relationships including before, meets, overlaps, starts, finishes, contains, equals, during, startedBy, finishedBy, overlappedBy, metBy, and after. Our new algorithms use the Relational Interval Tree, a purely SQL-based and objectrelationally wrapped index structure. The technique therefore preserves the industrial strength of the underlying RDBMS including stability, transactions, and performance. The efficiency of our approach is demonstrated by an experimental evaluation on a real weblog data set containing one million sessions.
The VLDB Journal
The interval join is a popular operation in temporal, spatial, and uncertain databases. The majority of interval join algorithms assume that input data reside on disk and so, their focus is to minimize the I/O accesses. Recently, an in-memory approach based on plane sweep (PS) for modern hardware was proposed which greatly outperforms previous work. However, this approach relies on a complex data structure and its parallelization has not been adequately studied. In this article, we investigate in-memory interval joins in two directions. First, we explore the applicability of a largely ignored forward scan (FS)-based plane sweep algorithm, for single-threaded join evaluation. We propose four optimizations for FS that greatly reduce its cost, making it competitive or even faster than the state-of-the-art. Second, we study in depth the parallel computation of interval joins. We design a non-partitioning-based approach that determines independent tasks of the join algorithm to run in pa...
Indexing Temporal Relations for Range-Duration Queries
arXiv (Cornell University), 2022
Temporal information plays a crucial role in many database applications, however support for queries on such data is limited. We present an index structure, termed RD-INDEX, to support range-duration queries over interval timestamped relations, which constrain both the range of the tuples' positions on the timeline and their duration. RD-INDEX is a grid structure in the two-dimensional space, representing the position on the timeline and the duration of timestamps, respectively. Instead of using a regular grid, we consider the data distribution for the construction of the grid in order to ensure that each grid cell contains approximately the same number of intervals. RD-INDEX features provable bounds on the running time of all the operations, allow for a simple implementation, and supports very predictable query performance. We benchmark our solution on a variety of datasets and query workloads, investigating both the query rate and the behavior of the individual queries. The results show that RD-INDEX performs better than the baselines on rangeduration queries, for which it is explicitly designed. Furthermore, it outperforms specialized indexes also on workloads containing queries constraining either only the duration or the range.
Managing intervals efficiently in object-relational databases
Proc. 26th Int. Conf. on Very Large …, 2000
Modern database applications show a growing demand for efficient and dynamic management of intervals, particularly for temporal and spatial data or for constraint handling. Common approaches require the augmentation of index structures which, however, is not supported by existing relational database systems. By design, the new Relational Interval Tree 1 (RI-tree) employs built-in indexes on an as-they-are basis and is easy to implement. Whereas the functionality and efficiency of the RI-tree is supported by any off-the-shelf relational DBMS, it is perfectly encapsulated by the object-relational data model. The RI-tree requires O(n/b) disk blocks of size b to store n intervals, O(log b n) I/O operations for insertion or deletion, and O(h · log b n + r/b) I/Os for an intersection query producing r results. The height h of the virtual backbone tree corresponds to the current expansion and granularity of the data space but does not depend on n. As demonstrated by our experimental evaluation on an Oracle8i server, competing dynamic interval access methods are outperformed by factors of up to 42 for disk accesses and 4.9 for query response time.
Efficiently processing queries on interval-and-value tuples in relational databases
2005
With the increasing occurrence of temporal and spatial data in present-day database applications, the interval data type is adopted by more and more database systems. For an efficient support of queries that contain selections on interval attributes as well as simple-valued attributes (e. g. numbers, strings) at the same time, special index structures are required supporting both types of predicates in combination. Based on the Relational Interval Tree, we present various indexing schemes that support such combined queries and can be integrated in relational database systems with minimum effort. Experiments on different query types show superior performance for the new techniques in comparison to competing access methods.
Leveraging range joins for the computation of overlap joins
The VLDB Journal, 2021
Joins are essential and potentially expensive operations in database management systems. When data is associated with time periods, joins commonly include predicates that require pairs of argument tuples to overlap in order to qualify for the result. Our goal is to enable built-in systems support for such joins. In particular, we present an approach where overlap joins are formulated as unions of range joins, which are more general purpose joins compared to overlap joins, i.e., are useful in their own right, and are supported well by B+-trees. The approach is sufficiently flexible that it also supports joins with additional equality predicates, as well as open, closed, and half-open time periods over discrete and continuous domains, thus offering both generality and simplicity, which is important in a system setting. We provide both a stand-alone solution that performs on par with the state-of-the-art and a DBMS embedded solution that is able to exploit standard indexing and clearly...
Interval processing with the UB-Tree
Proceedings International Database Engineering and Applications Symposium
Advanced data warehouses and web databases have set the demand for processing large sets of time ranges, quality classes, fuzzy data, personalized data and extended objects. Since, all of these data types can be mapped to intervals, interval indexing can dramatically speed up or even be an enabling technology for these new applications. We introduce a method for managing intervals by indexing the dual space with the UB-Tree. We show that our method is an effective and efficient solution, benefitting from all good characteristics of the UB-Tree, i.e., concurrency control, worst case guarantees for insertion, deletion and update as well as efficient query processing. Our technique can easily be integrated into an RDBMS engine providing the UB-Tree as access method. We also show that our technique is superior and more flexible to previously suggested techniques.
Similarity Search of Bounded TIDASETS within Large Time Interval Databases
Searching for similar entities within a database is a common and a daily, billions of times, performed task. Generally, similarities are calculated using common distance measures like Manhattan, Euclidean, Levenshtein, Mahalanobis or Dynamic Time Warping (DTW). In this paper, we present a similarity measure for time interval data, which allows searching for similar sets of time interval records bounded by a time window (e.g., a day, a week, or a month). We introduce three different groups of distance measures i.e., temporal order, temporal measure, and temporal relation distances. In addition, we present bitmap-based implementations for algorithms of each of the three types. We designed our solutions to perform well on large datasets and support distributed calculations. Evaluations show the out-standing performance regarding other interval related similarity measures, i.e., ARTEMIS and IBSM.
Fast time intervals mining using the transitivity of temporal relations
Knowledge and Information Systems, 2013
We introduce an algorithm, called KarmaLego, for the discovery of frequent symbolic time intervals related patterns (TIRPs). The mined symbolic time intervals can be part of the input, or can be generated by a temporalabstraction process from raw time-stamped data. The algorithm includes a data structure for TIRP-candidate generation and a novel method for efficient candidate-TIRP generation, by exploiting the transitivity property of Allen's temporal relations. Additionally, since the non-ambiguous definition of TIRPs does not specify the duration of the time intervals, we propose to pre-cluster the time intervals based on their duration to decrease the variance of the supporting instances. Our experimental comparison of the KarmaLego algorithm's runtime performance to several existing state of the art time intervals pattern mining methods demonstrated a significant speed up, especially with large datasets and low levels of minimal vertical support. Furthermore, pre-clustering by time interval duration led to an increase in the homogeneity of the duration of the discovered TIRP's supporting instances' time intervals components, accompanied, however, by a corresponding decrease in the number of discovered TIRPs.
Advanced indexing technique for temporal data
Computer Science and Information Systems, 2010
The need for efficient access and management of time dependent data in modern database applications is well recognised and researched. Existing access methods are mostly derived from the family of spatial R-tree indexing techniques. These techniques are particularly not suitable to handle data involving open ended intervals, which are common in temporal databases. This is due to overlapping between nodes and huge dead space found in the database. In this study, we describe a detailed investigation of a new approach called "Triangular Decomposition Tree" (TD-Tree). The underlying idea for the TD-Tree is to manage temporal intervals by virtual index structures relying on geometric interpretations of intervals, and a space partition method that results in an unbalanced binary tree. We demonstrate that the unbalanced binary tree can be efficiently manipulated using a virtual index. We also show that the single query algorithm can be applied uniformly to different query types without the need of dedicated query transformations. In addition to the advantages related to the usage of a single query algorithm for different query types and better space complexity, the empirical performance of the TDtree has been found to be superior to its best known competitors.