ParGRES: a middleware for executing OLAP queries in parallel". In: COPPE/UFRJ (original) (raw)

ParGRES: a middleware for executing OLAP queries in parallel

2005

ParGRES is a middleware aimed to efficiently process heavy weight queries, typical of OLAP, on top of a database cluster. ParGRES achieves query processing speed-up through intra-and inter-query parallelism in a PC cluster environment with database replication and virtual partitioning. It accelerates both individual queries and system throughput. Our experimental results show that ParGRES yields super-linear or near-linear speed-up. ParGRES middleware keeps application and database autonomy. As a result, it offers a non-intrusive migration solution from sequential to a parallel environment. Currently, ParGRES uses PostgreSQL, but it is not DBMS dependent, and has a Web administration tool. The main features of ParGRES are: automatic parsing of SQL queries to allow for intra-query parallel execution; query processing with inter-and intra-query parallelism; virtual dynamic partition definition; result composition; update processing; and dynamic load balancing. The main contribution of ParGRES is to combine inter and intra-query parallelism with dynamic load balancing for virtual partitions, all within an open source cost-effective solution.

High Performance Parallel DBMS

Parallelism is the key to realizing high performance, scalable, fault tolerant database management systems. With the predicted future database sizes and complexity of queries, the scalability of these systems to hundreds and thousands of processors is essential for satisfying the projected demand. This chapter describes three key components of a high performance parallel database management system. First, data partitioning strategies that distribute the workload of a table across the available nodes while minimizing the overhead of parallelism. Second, algorithms for parallel processing of a join operator.

On transforming a sequential SQL-DBMS into a parallel one: First results and experiences of the MIDAS project

Lecture Notes in Computer Science, 1996

One way to satisfy the increasing demand for processing power and I/O bandwidth is to have parallel database management systems (PDBMS) that employ a number of processors, loosely or tightly coupled, serving database requests concurrently. In this paper we want to show an evolution path from an existing and commercially available sequential SQL database system to a parallel SQL database system. This transformation process is described from a software engineering and software reuse point of view emphasizing the system architecture. We report on first results and experiences gained while transforming the existing sequential system and constructing the new PDBMS. In order to show the viability of our PDBMS, a number of specific investigations that exploit this PDBMS testbed are presented as well.

Parallel OLAP query processing in database clusters with data replication

Distributed and Parallel Databases, 2009

We consider the problem of improving the performance of OLAP applications in a database cluster (DBC), which is a low cost and effective parallel solution for query processing. Current DBC solutions for OLAP query processing provide for intra-query parallelism only, at the cost of full replication of the database. In this paper, we propose more efficient distributed database design alternatives which combine physical/virtual partitioning with partial replication. We also propose a new load balancing strategy that takes advantage of an adaptive virtual partitioning to redistribute the load to the replicas. Our experimental validation is based on the implementation of our solution on the SmaQSS DBC middleware prototype. Our experimental results using the TPC-H benchmark and a 32-node cluster show very good speedup.

A New General Purpose Parallel Database System

International Symposium on Parallel Architectures, Algorithms and Networks, 1997

This paper is concerned with the transparent parallelisationof declarative database queries, based ontheoretical principles. We have designed an entiredatabase architecture suitable for use on any generalpurposeparallel machine. This architecture addressesthe shortcomings in flexibility and scalability of commercialparallel databases. A substantial benefit is thatthe mathematical principles underlying our frameworkallow provably correct parallel evaluations and optimisations,using...

Distributed and parallel database systems

1996

The maturation of database management system (DBMS) technology has coincided with significant developments in distributed computing and parallel processing technologies. The end result is the emergence of distributed database management systems and parallel database management systems. These systems have started to become the dominant data-management tools for highly data-intensive applications.

Parallel database processing on a 100 Node PC cluster

Proceedings of the 1997 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '97, 1997

We developed a PC cluster system consists of 100 PCs. Each PC employs the 200MHz Pentium Pro CPU and is connected with others through an ATM switch. We picked up two kinds of data intensive applications. One is decision support query processing. And the other is data mining, specifically, association rule mining. As a high speed network, ATM technology has recently come to be a de facto standard. While other high performance network standards are also available, ATM networks are widely used from local area to widely distributed environments. One of the problems of the ATM networks is its high latencies, in contrast to their higher bandwidths. This is usually considered a serious flaw of ATM in composing high performance massively parallel processors. However, applications such as large scale database analyses are insensitive to the communication latency, requiring only the bandwidth. On the other hand, the performance of personal computers is increasing rapidly these days while the prices of PCs continue to fall at a much faster rate than workstations'. The 200MHz Pentium Pro CPU is competitive in integer performance to the processor chips found in workstations. Although it is still weak at floating point operations, they are not frequently used in database applications. Thus, by combining PCs and ATM switches we can construct a large scale parallel platform very easily and very inexpensively. In this paper, we examine how such a system can help the data warehouse processing, which currently runs on expensive high-end mainframes and/or workstation servers. In our first experiment, we used the most complex query of the standard benchmark, TPC-D, on a 100 GB database to evaluate the system compared with commercial parallel systems. Our PC cluster exhibited much higher performance compared with those in current TPC benchmark reports. Second, we parallelized association rule mining and ran large scale data mining on the PC cluster. Sufficiently high linearity was obtained. Thus we believe that such commodity based PC clusters will play a very important role in large scale database processing.

Parallel Processing with Autonomous Databases in a Cluster System

Lecture Notes in Computer Science, 2002

We consider the use of a cluster system for Application Service Provider (ASP). In the ASP context, hosted applications and databases can be update-intensive and must remain autonomous. In this paper, we propose a new solution for parallel processing with autonomous databases, using a replicated database organization. The main idea is to allow the system administrator to control the tradeoff between database consistency and application performance. Application requirements are captured through execution rules stored in a shared directory. They are used (at run time) to allocate cluster nodes to user requests in a way that optimizes load balancing while satisfying application consistency requirements. We also propose a new preventive replication method and a transaction load balancing architecture which can trade-off consistency for performance using execution rules. Finally, we discuss the ongoing implementation at LIP6 using a Linux cluster running Oracle 8i.

Query Processing in a DBMS for Cluster Systems

2010

The paper is devoted to the problem of effective query execution in clusterbased systems. An original approach to data placement and replication on the nodes of a cluster system is presented. Based on this approach, a load balancing method for parallel query processing is developed. A method for parallel query execution in cluster systems based on the load balancing method is suggested. Results of computational experiments are presented, and analysis of efficiency of the proposed approaches is performed.