Automated partitioning design in parallel database systems (original) (raw)

Locality-aware Partitioning in Parallel Database Systems

Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015

Parallel database systems horizontally partition large amounts of structured data in order to provide parallel data processing capabilities for analytical workloads in sharednothing clusters. One major challenge when horizontally partitioning large amounts of data is to reduce the network costs for a given workload and a database schema. A common technique to reduce the network costs in parallel database systems is to co-partition tables on their join key in order to avoid expensive remote join operations. However, existing partitioning schemes are limited in that respect since only subsets of tables in complex schemata sharing the same join key can be co-partitioned unless tables are fully replicated. In this paper we present a novel partitioning scheme called predicate-based reference partition (or PREF for short) that allows to co-partition sets of tables based on given join predicates. Moreover, based on PREF, we present two automatic partitioning design algorithms to maximize data-locality. One algorithm only needs the schema and data whereas the other algorithm additionally takes the workload as input. In our experiments we show that our automated design algorithms can partition database schemata of different complexity and thus help to effectively reduce the runtime of queries under a given workload when compared to existing partitioning approaches.

Experimental evidence on partitioning in parallel data warehouses

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP - DOLAP '04, 2004

Parallelism can be used for major performance improvement in large Data warehouses (DW) with performance and scalability challenges. A simple low-cost shared-nothing architecture with horizontally fully-partitioned facts can be used to speedup response time of the data warehouse significantly. However, extra overheads related to processing large replicated relations and repartitioning requirements between nodes can significantly degrade speedup performance for many query patterns if special care is not taken during placement to minimize such overheads. In this paper we show these problems experimentally with the help of the performance evaluation benchmark TPC-H and identify simple modifications that can minimize such undesirable extra overheads. We analyze experimentally a simple and easy-to-apply partitioning and placement decision that achieves good performance improvement results.

Optimizing queries over partitioned tables in MPP systems

Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 2014

Partitioning of tables based on value ranges provides a powerful mechanism to organize tables in database systems. In the context of data warehousing and large-scale data analysis partitioned tables are of particular interest as the nature of queries favors scanning large swaths of data. In this scenario, eliminating partitions from a query plan that contain data not relevant to answering a given query can represent substantial performance improvements. Dealing with partitioned tables in query optimization has attracted significant attention recently, yet, a number of challenges unique to Massively Parallel Processing (MPP) databases and their distributed nature remain unresolved. In this paper, we present optimization techniques for queries over partitioned tables as implemented in Pivotal Greenplum Database. We present a concise and unified representation for partitioned tables and devise optimization techniques to generate query plans that can defer decisions on accessing certain partitions to query run-time. We demonstrate, the resulting query plans distinctly outperform conventional query plans in a variety of scenarios.

Dynamic Workload-Based Partitioning Algorithms for Continuously Growing Databases

Lecture Notes in Computer Science, 2013

Applications with very large databases, where data items are continuously appended, are becoming more and more common. Thus, the development of efficient data partitioning is one of the main requirements to yield good performance. In the case of applications that have complex access patterns, e.g. scientific applications, workload-based partitioning could be exploited. However, existing workload-based approaches, which work in a static way, cannot be applied to very large databases. In this paper, we propose DynPart and DynPartGroup, two dynamic partitioning algorithms for continuously growing databases. These algorithms efficiently adapt the data partitioning to the arrival of new data elements by taking into account the affinity of new data with queries and fragments. In contrast to existing static approaches, our approach offers constant execution time, no matter the size of the database, while obtaining very good partitioning efficiency. We validated our solution through experimentation over real-world data; the results show its effectiveness. ⋆ Work partially funded by the CNPq-INRIA HOSCAR project. Data are appended to the catalog database as new observations are performed and the resulting database size is estimated to reach 100TB very soon. Scientists around the globe can access the database with queries that may contain a considerable number of attributes. The volume of data that such applications hold poses important challenges for data management. In particular, efficient solutions are needed to partition and distribute the data in multiple servers, e.g., in a cluster. An efficient partitioning scheme would try to minimize the number of fragments that are accessed in the execution of a query, thus minimizing the overhead of the distributed execution. Vertical partitioning solutions, such as column-oriented databases [18], may be useful for physical design on each node, but fail to provide an efficient distributed partitioning, in particular for applications with high dimensional queries, where joins would have to be executed by transferring data between nodes. Traditional horizontal partitioning approaches, such as hashing or range-based partitioning, are unable to capture the complex access patterns present in scientific computing applications, especially because these applications usually make use of complicated relations, including mathematical operations, over a big set of columns, and are difficult to be predefined a priori.

Prediction of Horizontal Data Partitioning Through Query Execution Cost Estimation

ArXiv, 2019

The excessively increased volume of data in modern data management systems demands an improved system performance, frequently provided by data distribution, system scalability and performance optimization techniques. Optimized horizontal data partitioning has a significant influence of distributed data management systems. An optimally partitioned schema found in the early phase of logical database design without loading of real data in the system and its adaptation to changes of business environment are very important for a successful implementation, system scalability and performance improvement. In this paper we present a novel approach for finding an optimal horizontally partitioned schema that manifests a minimal total execution cost of a given database workload. Our approach is based on a formal model that enables abstraction of the predicates in the workload queries, and are subsequently used to define all relational fragments. This approach has predictive features acquired by...

A parallel query processing system based on graph-based database partitioning

Information Sciences, 2018

As parallel database systems have large amounts of data to process, it is important to utilize a scalable and efficient horizontal database partitioning method. The existing partitioning methods have major drawbacks that not only cause large amounts of data redundancy but also still require expensive shuffle operations for join queries in many cases-despite their high data redundancy. We elucidate upon the drawbacks originating from the treebased partitioning schemes and propose a novel graph-based database partitioning method called GPT that both improves the query performance and reduces data redundancy. We integrate the proposed GPT method into a parallel query processing system, Spark SQL, across all the relevant layers and modules, including the query plan generator and the scan operator. Through extensive experiments using three benchmarks, TPC-DS, IMDB and BioWarehouse, we show that GPT significantly outperforms the state-of-the-art method in terms of both storage overhead and query performance.

Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems

Proceedings of the 2012 international conference on Management of Data - SIGMOD '12, 2012

The advent of affordable, shared-nothing computing systems portends a new class of parallel database management systems (DBMS) for on-line transaction processing (OLTP) applications that scale without sacrificing ACID guarantees [7, 9]. The performance of these DBMSs is predicated on the existence of an optimal database design that is tailored for the unique characteristics of OLTP workloads [43]. Deriving such designs for modern DBMSs is difficult, especially for enterprise-class OLTP systems, since they impose extra challenges: the use of stored procedures, the need for load balancing in the presence of time-varying skew, complex schemas, and deployments with larger number of partitions. To this purpose, we present a novel approach to automatically partitioning databases for enterprise-class OLTP systems that significantly extends the state of the art by: (1) minimizing the number distributed transactions, while concurrently mitigating the effects of temporal skew in both the data distribution and accesses, (2) extending the design space to include replicated secondary indexes, (4) organically handling stored procedure routing, and (3) scaling of schema complexity, data size, and number of partitions. This effort builds on two key technical contributions: an analytical cost model that can be used to quickly estimate the relative coordination cost and skew for a given workload and a candidate database design, and an informed exploration of the huge solution space based on large neighborhood search. To evaluate our methods, we integrated our database design tool with a high-performance parallel, main memory DBMS and compared our methods against both popular heuristics and a state-of-the-art research prototype [17]. Using a diverse set of benchmarks, we show that our approach improves throughput by up to a factor of 16× over these other approaches.

Strategies for Effective Partitioning Data at Scale in Large-scale Analytics

Strategies for Effective Partitioning Data at Scale in Large-scale Analytics, 2019

The modern digital age brings the wisdom of big data, making it more complex for the organization to process this huge amount of data swiftly and efficiently. In the same way, given the growing data sizes, conventional data processing processes typically fail to satisfy the processing requirements. The Apache Spark maturity has arisen as the winning toolbox for Big Data analysis, offering distributed computing with in-memory computations. Nevertheless, there are no free lunches, and you should realize that Spark could work best only when proper partitioning techniques are applied. The paper might examine various data partitioning techniques in Apache Spark, including hash partitioning, range partitioning, and custom partitioning. We focus on how to partition Spark RDDs and DataFrames and also study how partitioning can optimize in-memory processing by tuning the size of partitions, eliminating data shuffling, and leveraging broadcast joins for skewed data. Moreover, we explore partitioning approaches, including partition pruning, predicate pushdown and partitionwise joins, designed for computational efficiency enhancements. We also mention the challenges and best practices implemented in Spark at the point of Data Partitioning. Through an efficient data partitioning process, companies can significantly improve the performance and scalability of their large analytics workflows by using Apache Spark.

Efficient Partitioning of Large Databases without Query Statistics

Efficient Partitioning of Large Databases without Query Statistics, 2016

An efficient way of improving the performance of a database management system is distributed processing. Distribution of data involves fragmentation or partitioning, replication, and allocation process. Previous research works provided partitioning based on empirical data about the type and frequency of the queries. These solutions are not suitable at the initial stage of a distributed database as query statistics are not available then. In this paper, I have presented a fragmentation technique, Matrix based Fragmentation (MMF), which can be applied at the initial stage as well as at later stages of distributed databases. Instead of using empirical data, I have developed a matrix, Modified Create, Read, Update and Delete (MCRUD), to partition a large database properly. Allocation of fragments is done simultaneously in my proposed technique. So using MMF, no additional complexity is added for allocating the fragments to the sites of a distributed database as fragmentation is synchronized with allocation. The performance of a DDBMS can be improved significantly by avoiding frequent remote access and high data transfer among the sites. Results show that proposed technique can solve the initial partitioning problem of large distributed databases.

Adaptive hybrid partitioning for OLAP query processing in a database cluster

International Journal of High Performance Computing and Networking, 2008

OLAP queries are typically heavy-weight and ad-hoc thus requiring high storage capacity and processing power. In this paper, we address this problem using a database cluster which we see as a cost-effective alternative to a tightly-coupled multiprocessor. We propose a solution to efficient OLAP query processing using a simple data parallel processing technique called adaptive virtual partitioning which dynamically tunes partition sizes, without requiring any knowledge about the database and the DBMS. To validate our solution, we implemented a Java prototype on a 32 node cluster system and ran experiments with typical queries of the TPC-H benchmark. The results show that our solution yields linear, and sometimes superlinear, speedup. In many cases, it outperforms traditional virtual partitioning by factors superior to 10.