Efficient and scalable trie-based algorithms for computing set containment relations (original) (raw)

Free PDF

Set containment join revisited Cover Page

Free PDF

Scalable and Robust Set Similarity Join Cover Page

Free PDF

Implementing set-theoretic relational-query functions using highly parallel index-processing hardware Cover Page

Free PDF

Fast set operations using treaps Cover Page

Free PDF

General-purpose join algorithms for large graph triangle listing on heterogeneous systems Cover Page

Implementing collection of sets with trie : a stepping stone for performances?

Main operations of the Set Collection Abstract Data Type are insertion, research and deletion. A well known option to implement these operations is to use hashtable. Although hashtable does not admit good time complexities in the worst case, the practical time complexities are efficient. Another option is to use the data structure known as the trie. The trie is useful for two main reasons. Firstly, with such a data structure, mentionned operations admit very good theoretical time complexities. Secondly a trie can be seen as a compact representation of a collection of sets since some parts of them are merged together. Aim of this article is to evaluate performances of the trie data structure. The Java language proposes an abstract class corresponding to the Set Collection A.D.T. operations. We propose in this article three different implementations of this abstract class. All of them are variations of the way to manage the sons of nodes. Theoretical complexities are then evaluated. A...

Free PDF

Implementing collection of sets with trie : a stepping stone for performances? Cover Page

Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs

Recent work has demonstrated that the use of programmable GPUs can be advantageous during relational query process-ing on analytical workloads. In this paper, we take a closer look at graph problems such as finding all triangles and all four-cliques of a graph. In particular, we present two different join algorithms for the GPU. The first is an im-plementation of Leapfrog-Triejoin (LFTJ), a recently pre-sented worst-case optimal multi-predicate join algorithm. The second is a novel approach, inspired by the former but more suitable for GPU architectures. Our preliminary per-formance benchmarks show that for both approaches using GPUs is cost-effective. (the GPU implementation outper-forms respective CPU variants). While the second algo-rithm is faster overall, it comes with increased implemen-tation complexity and storage requirements for intermedi-ary results. Furthermore, both our algorithms are compet-itive with the hand-written C++ implementation for find-ing triangles and four-...

Free PDF

Multipredicate Join Algorithms for Accelerating Relational Graph Processing on GPUs Cover Page

A Near-Optimal Parallel Algorithm for Joining Binary Relations

2020

We present a constant-round algorithm in the massively parallel computation (MPC) model for evaluating a natural join where every input relation has two attributes. Our algorithm achieves a load of tildeO(m/p1/rho)\tilde{O}(m/p^{1/\rho})tildeO(m/p1/rho) where mmm is the total size of the input relations, ppp is the number of machines, rho\rhorho is the join's fractional edge covering number, and tildeO(.)\tilde{O}(.)tildeO(.) hides a polylogarithmic factor. The load matches a known lower bound up to a polylogarithmic factor. At the core of the proposed algorithm is a new theorem (which we name {\em the isolated cartesian product theorem}) that provides fresh insight into the problem's mathematical structure. Our result implies that the {\em subgraph enumeration problem}, where the goal is to report all the occurrences of a constant-sized subgraph pattern, can be settled optimally (up to a polylogarithmic factor) in the MPC model.

Free PDF

A Near-Optimal Parallel Algorithm for Joining Binary Relations Cover Page

Free PDF

On indexing error-tolerant set containment Cover Page

Free PDF

Sort Vs. Hash Revisited: Fast Join Implementation on Modern Multi-Core CPUs Cover Page