Degree Sequence Bound For Join Cardinality Estimation (original) (raw)

Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

ArXiv, 2021

Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source DBMS PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan qu...

Simpli-Squared: A Very Simple Yet Unexpectedly Powerful Join Ordering Algorithm Without Cardinality Estimates

ArXiv, 2021

The Join Order Benchmark (JOB) has become the de facto standard to assess the performance of relational database query optimizers due to its complexity and completeness. In order to compute the optimal execution plan – join order – existing solutions employ extensive data synopses and correlations – functional dependencies – between table attributes. These structures incur significant overhead to design, build, and maintain. In this paper, we present Simplicity Simplified (Simpli-Squared), a very simple join ordering algorithm that achieves unexpectedly good results. Simpli-Squared computes the join order without using any statistics or cardinality estimates. It takes as input only the referential integrity constraints declared at schema definition and the number of tuples (size) in the base tables. The join order of a given query is computed by splitting the join graph along the many-to-many joins and sorting the tables based on their size. The tables involved in one-to-many joins ...

CS2: a new database synopsis for query estimation

Fast and accurate estimations for complex queries are profoundly beneficial for large databases with heavy workloads. In this research, we propose a statistical summary for a database, called CS2 (Correlated Sample Synopsis), to provide rapid and accurate result size estimations for all queries with joins and arbitrary selections. Unlike the state-of-the-art techniques, CS2 does not completely rely on simple random samples, but mainly consists of correlated sample tuples that retain join relationships with less storage. We introduce a statistical technique, called reverse sample, and design a powerful estimator, called reverse estimator, to fully utilize correlated sample tuples for query estimation. We prove both theoretically and empirically that the reverse estimator is unbiased and accurate using CS2. Extensive experiments on multiple datasets show that CS2 is fast to construct and derives more accurate estimations than existing methods with the same space budget.

Selectivity Estimation for Joins Using Systematic Sampling

Database and Expert Systems Applications, 1997

We propose a new approach to the estimation of join se- lectivity. The technique, which we have called "systematic sampling", is a novel variant of the sampling-based ap- proach. Systematic sampling works as follows: Given a relation of tuples, with a join attribute that can be accessed in ascending/descending order via an index, if is the number of tuples to

Beyond Equi-joins: Ranking, Enumeration and Factorization

2021

We study theta-joins in general and join predicates with conjunctions and disjunctions of inequalities in particular, focusing on ranked enumeration where the answers are returned incrementally in an order dictated by a given ranking function. Our approach achieves strong time and space complexity properties: with n denoting the number of tuples in the database, we guarantee for acyclic full join queries with inequality conditions that for every value of k, the k top-ranked answers are returned in O(n polylog n + k log k) time. This is within a polylogarithmic factor of the best known complexity for equi-joins and even of 𝒪(n+k), the time it takes to look at the input and return k answers in any order. Our guarantees extend to join queries with selections and many types of projections, such as the so-called free-connex queries. Remarkably, they hold even when the entire output is of size n^ℓ for a join of ℓ relations. The key ingredient is a novel 𝒪(n polylog n)-size factorized repr...

Translation Grids for Multi-way Join Size Estimation

2022

We present a novel approach to estimate query result sizes for queries containing multiple joins. Our approach relies on (1) enhanced AKMV sketches and (2) a novel data structure called translation grid. In essence, we obtain estimates by connecting hashes from AKMV sketches via a translation grid.