Asoke Datta - Academia.edu (original) (raw)
Uploads
Papers by Asoke Datta
ArXiv, 2021
Cost-based query optimization remains a critical task in relational databases even after decades ... more Cost-based query optimization remains a critical task in relational databases even after decades of research and industrial development. Query optimizers rely on a large range of statistical synopses – including attribute-level histograms and table-level samples – for accurate cardinality estimation. As the complexity of selection predicates and the number of join predicates increase, two problems arise. First, statistics cannot be incrementally composed to effectively estimate the cost of the sub-plans generated in plan enumeration. Second, small errors are propagated exponentially through join operators, which can lead to severely sub-optimal plans. In this paper, we introduce COMPASS, a novel query optimization paradigm for in-memory databases based on a single type of statistics—Fast-AGMS sketches. In COMPASS, query optimization and execution are intertwined. Selection predicates and sketch updates are pushed-down and evaluated online during query optimization. This allows Fast-...
ArXiv, 2021
The Join Order Benchmark (JOB) has become the de facto standard to assess the performance of rela... more The Join Order Benchmark (JOB) has become the de facto standard to assess the performance of relational database query optimizers due to its complexity and completeness. In order to compute the optimal execution plan – join order – existing solutions employ extensive data synopses and correlations – functional dependencies – between table attributes. These structures incur significant overhead to design, build, and maintain. In this paper, we present Simplicity Simplified (Simpli-Squared), a very simple join ordering algorithm that achieves unexpectedly good results. Simpli-Squared computes the join order without using any statistics or cardinality estimates. It takes as input only the referential integrity constraints declared at schema definition and the number of tuples (size) in the base tables. The join order of a given query is computed by splitting the join graph along the many-to-many joins and sorting the tables based on their size. The tables involved in one-to-many joins ...
ArXiv, 2021
Cost-based query optimization remains a critical task in relational databases even after decades ... more Cost-based query optimization remains a critical task in relational databases even after decades of research and industrial development. Query optimizers rely on a large range of statistical synopses – including attribute-level histograms and table-level samples – for accurate cardinality estimation. As the complexity of selection predicates and the number of join predicates increase, two problems arise. First, statistics cannot be incrementally composed to effectively estimate the cost of the sub-plans generated in plan enumeration. Second, small errors are propagated exponentially through join operators, which can lead to severely sub-optimal plans. In this paper, we introduce COMPASS, a novel query optimization paradigm for in-memory databases based on a single type of statistics—Fast-AGMS sketches. In COMPASS, query optimization and execution are intertwined. Selection predicates and sketch updates are pushed-down and evaluated online during query optimization. This allows Fast-...
ArXiv, 2021
The Join Order Benchmark (JOB) has become the de facto standard to assess the performance of rela... more The Join Order Benchmark (JOB) has become the de facto standard to assess the performance of relational database query optimizers due to its complexity and completeness. In order to compute the optimal execution plan – join order – existing solutions employ extensive data synopses and correlations – functional dependencies – between table attributes. These structures incur significant overhead to design, build, and maintain. In this paper, we present Simplicity Simplified (Simpli-Squared), a very simple join ordering algorithm that achieves unexpectedly good results. Simpli-Squared computes the join order without using any statistics or cardinality estimates. It takes as input only the referential integrity constraints declared at schema definition and the number of tuples (size) in the base tables. The join order of a given query is computed by splitting the join graph along the many-to-many joins and sorting the tables based on their size. The tables involved in one-to-many joins ...