Complexity Analysis of query processing in Distribute OLAP Systems (original) (raw)

Efficient OLAP query processing in distributed data warehouses

Information Systems, 2003

The success of Internet applications has led to an explosive growth in the demand for bandwidth from Internet Service Providers. Managing an Internet protocol network requires collecting and analyzing network data, such as flow-level traffic statistics. Such analyses can typically be expressed as OLAP queries, e.g., correlated aggregate queries and data cubes. Current day OLAP tools for this task assume the availability of the data in a centralized data warehouse. However, the inherently distributed nature of data collection and the huge amount of data extracted at each collection point make it impractical to gather all data at a centralized site. One solution is to maintain a distributed data warehouse, consisting of local data warehouses at each collection point and a coordinator site, with most of the processing being performed at the local sites. In this paper, we consider the problem of efficient evaluation of OLAP queries over a distributed data warehouse. We have developed the Skalla system for this task. Skalla translates OLAP queries, specified as certain algebraic expressions, into distributed evaluation plans which are shipped to individual sites. A salient property of our approach is that only partial results are shipped -never parts of the detail data. We propose a variety of optimizations to minimize both the synchronization traffic and the local processing done at each site. We finally present an experimental study based on TPC-R data. Our results demonstrate the scalability of our techniques and quantify the performance benefits of the optimization techniques that have gone into the Skalla system. r

Data Warehousing and OLAP: Improving Query Performance Using Distributed Computing

Data warehouses are used to store large amounts of data. This data is often used for On-Line Analytical Processing (OLAP) where short response times are essential for on-line decision support. One of the most important requirements of a data warehouse server is the query performance. The principal aspect from the user perspective is how quickly the server processes a given query: "the data warehouse must be fast". The main focus of our research is finding adequate solutions to improve query response time of typical OLAP queries and improve scalability using a distributed computation environment that takes advantage of characteristics specific to the OLAP context. Our proposal provides very good performance and scalability even on huge data warehouses.

Design and Implementation of OLAP System for Distributed Data Warehouse

AL-Rafidain Journal of Computer Sciences and Mathematics, 2013

Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. A distributed OLAP system is designed which uses multi microcomputers based local area network. The introduction distributes technology into OLAP system that can disintegrate the complicated query and analysis into different servers. In this paper, there are a lot of theoretical concepts associated with data warehouse and OLAP systems, and distributed data will be the implementation of several measures such as design cubic data and distribution algorithm and division of the data warehouse and decision support system DSS is performed to answer the complicated query. Practical results show that the distribution of data to multiple servers with OLAP system is faster according to the algorithm that has been dealing with client-server architecture to distribute the data warehouse. Statistical analysis concepts are used from current work to get predictable results which can be used to get suitable result DSS.

Improving Query Processing Time of Olap Cube Using Olap Operations

The popularity of OLAP cube has been growing due to the huge volume of data and need for ad-hoc analytical queries. As OLAP cube provides multidimensional view of data the analysis of data become faster and improve response time over relational databases. The performance here is measured on the basis of throughput of the queries that is the time taken by a query in fetching the appropriate and efficient result. The processing time of query processing is observed to be better in case of OLAP cube as compared with the OLTP but still there is some hope of more improvement. In this regard applying OLAP operations on a cube found to be more appropriate approach to improve query processing time of OLAP cube. In this paper a comparative analysis is done to compare the query processing time of the OLAP cube and the OLAP operations.

An Effective Method to Answer OLAP Queries using R*-trees in Distributed Environment

International Journal of Computer Applications, 2014

Evaluation of OLAP queries is one of the challenging tasks in a database system. Attempts are being continuously made to improve the efficiency of the methods that answer OLAP queries. This paper makes one such attempt. This paper proposes a method in a Hadoop and MapReduce distributed environment. Experimental evaluation gives improved results due to the proposed method.

Parallel OLAP query processing in database clusters with data replication

Distributed and Parallel Databases, 2009

We consider the problem of improving the performance of OLAP applications in a database cluster (DBC), which is a low cost and effective parallel solution for query processing. Current DBC solutions for OLAP query processing provide for intra-query parallelism only, at the cost of full replication of the database. In this paper, we propose more efficient distributed database design alternatives which combine physical/virtual partitioning with partial replication. We also propose a new load balancing strategy that takes advantage of an adaptive virtual partitioning to redistribute the load to the replicas. Our experimental validation is based on the implementation of our solution on the SmaQSS DBC middleware prototype. Our experimental results using the TPC-H benchmark and a 32-node cluster show very good speedup.

Query Processing Issues in Data Warehouses

2000

Data warehouses store organizational data extracted from many operational databases. They are mainly used for decision support and OLAP applications. As a result, queries to a data warehouse have unique idiosyncrasies that have to be separately addressed. Data warehouse queries usually are much less frequent than OLTP queries and touch upon much more data than a typical OLTP query. In addition, di erent paradigms of querying are often necessary to provide e cient support for the analyst who uses the data warehouse. This paper addresses issues related to query processing in data warehouses and contrasts di erent approaches.

Parallel query processing for OLAP in grids

Concurrency and Computation: Practice and Experience, 2008

OLAP query processing is critical for enterprise grids. Capitalizing on our experience with the ParGRES database cluster, we propose a middleware solution, GParGRES, which exploits database replication and inter-and intra-query parallelism to efficiently support OLAP queries in a grid. GParGRES is designed as a wrapper that enables the use of ParGRES in PC clusters of a grid (in our case, Grid5000). Our approach has two levels of query splitting: grid-level splitting, implemented by GParGRES, and nodelevel splitting, implemented by ParGRES. GParGRES has been partially implemented as database grid services compatible with existing grid solutions such as the open grid service architecture and the Web services resource framework. We give preliminary experimental results obtained with two clusters of Grid5000 using queries of the TPC-H Benchmark. The results show linear or almost linear speedup in query execution, as more nodes are added in all tested configurations. N. KOTOWSKI ET AL. databases using Web services and provide transparent support for database queries . Ideally, a grid database solution must respect database autonomy (i.e. avoid database or application migration) while taking advantage of distributed and parallel computing. This can be achieved through the development of a middleware layer between the user applications and the databases. Such a middleware should provide for distributed and parallel query processing with non-intrusive techniques, considering DBMS as black-box components; hence, there is no need for database or application migration.