Sparsity Handling and Data Explosion in OLAP Systems (original) (raw)

Classification of Sparsity Patterns and Performance Evaluation in OLAP Systems (データベースシステム研究報告 夏のデータベースワークショップ DBWS2002)

2002

In fact, input data of OLAP is usually very sparse and it causes data explosion phenomenon in precomputation of summary tables. With very sparse data, space for all possible cells defined as combinations of dimension members in multidimensional cube has to be assigned although they have no actual data. Therefore, this takes a huge amount of space, regardless of the storage efficiency or the database technology used and decreases response time of multi-dimensional queries [7,8]. Several approaches have been presented for handling sparsity of OLAP like chunk-based array structure [9,10], composite dimension method [11] and sparsedense split method [12]. Although these techniques can handle the sparse data with a fixed pattern, they have no general solution to the mixed type of sparsity patterns. However, actual data in applications of OLAP represents complex sparsity patterns so more generalized approaches have to be considered. OLAP performs the applications that interact with users ...

Achieving Query Optimization Using Sparsity Management in OLAP System

Data Warehouses are increasing their data volume at an accelerated rate; high disk space consumption; slow query response time and complex database administration are common problems in these environments. The lack of a proper data model and an adequate architecture specifically targeted towards these environments are the root causes of these problems. Inefficient management of stored data includes duplicate values at column level and poor management of data sparsity which derives from a low data density, and affects the final size of Data Warehouses. It has been demonstrated that the Relational Model and Relational technology are not the best techniques for managing duplicates and data sparsity. The novelty of this research is to compare some data models considering their data density and their data sparsity management to optimize Data Warehouse environments. In this research paper various techniques for query performance optimization have been explored and a close association of its conceptual aspects with Oracle Warehouse Builder is mapped.

Improving OLAP Analysis of Multidimensional Data Streams via Efficient Compression Techniques

Intelligent Techniques for Warehousing and Mining Sensor Network Data

Sensor networks represent a leading case of data stream sources coming from real-life application scenarios. Sensors are non-reactive elements which are used to monitor real-life phenomena, such as live weather conditions, network traffic etc. They are usually organized into networks where their readings are transmitted using low level protocols. A relevant problem in dealing with data streams consists in the fact that they are intrinsically multi-level and multidimensional in nature, so that they require to be analyzed by means of a multi-level and a multi-resolution (analysis) model accordingly, like OLAP, beyond traditional solutions provided by primitive SQL-based DBMS interfaces. Despite this, a significant issue in dealing with OLAP is represented by the so-called curse of dimensionality problem, which consists in the fact that, when the number of dimensions of the target data cube increases, multidimensional data cannot be accessed and queried efficiently, due to their enormo...

ERATOSTHENES: Design and Architecture of an OLAP System

2001

On-Line Analytical Processing (OLAP) is a trend in database technology, based on the multidimensional view of data. The aim of this paper is twofold: (a) to list general problems and solutions applicable to the design of any OLAP system and (b) to present the specific design decisions that we made for a prototype under development at NTUA, which we call ERATOSTHENES. The paper addresses requirements and design issues for all three models involved in an OLAP system: the presentational, logical and physical model. It also discusses in detail the architecture and the major components of ERATOSTHENES. * This research has been partially funded by the European Union's Information Society Technologies Programme (IST) under project EDITH (IST-1999-20722).

Concepts and Fundaments of Data Warehousing and OLAP Concepts and Fundaments of Data Warehousing and OLAP 2017

In recent years, it has been imperative for organizations to make fast and accurate decisions in order to make them much more competitive and profitable. Data warehouses appear as key technological elements for the exploration and analysis of data, and subsequent decision making in a business environment. This book deals with the fundamental concepts of data warehouses and explores the concepts associated with data warehousing and analytical information analysis using OLAP. The reader is guided by the theoretical description of each of the concepts and by the presentation of numerous practical examples that allow assimilating the acquisition of skills in the field.

An overview of data warehousing and OLAP technology

ACM Sigmod record, 1997

Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and services are now available, and all of the principal database management system vendors now have offerings in these areas. Decision support places some rather different requirements on database technology compared to traditional on-line transaction processing applications. This paper provides an overview of data warehousing and OLAP technologies, with an emphasis on their new requirements. We describe back end tools for extracting, cleaning and loading data into a data warehouse; multidimensional data models typical of OLAP; front end client tools for querying and data analysis; server extensions for efficient query processing; and tools for metadata management and for managing the warehouse. In addition to surveying the state of the art, this paper also identifies some promising research issues, some of which are related to problems that the database research community has worked on for years, but others are only just beginning to be addressed. This overview is based on a tutorial that the authors presented at

High-Dimensional OLAP

Proceedings 2004 VLDB Conference, 2004

Data cube has been playing an essential role in fast OLAP (online analytical processing) in many multi-dimensional data warehouses. However, there exist data sets in applications like bioinformatics, statistics, and text processing that are characterized by high dimensionality, e.g., over 100 dimensions, and moderate size, e.g., around 10 6 tuples. No feasible data cube can be constructed with such data sets. In this paper we will address the problem of developing an efficient algorithm to perform OLAP on such data sets. Experience tells us that although data analysis tasks may involve a high dimensional space, most OLAP operations are performed only on a small number of dimensions at a time. Based on this observation, we propose a novel method that computes a thin layer of the data cube together with associated value-list indices. This layer, while being manageable in size, will be capable of supporting flexible and fast OLAP operations in the original high dimensional space. Through experiments we will show that the method has I/O costs that scale nicely with dimensionality. Furthermore, the costs are comparable to that of accessing an existing data cube when full materialization is possible.

Computation of OLAP Data Cubes

Encyclopedia of Data Warehousing and Mining, Second Edition

The focus of online analytical processing (OLAP) is to provide a platform for analyzing data (e.g., sales data) with multiple dimensions (e.g., product, location, time) and multiple measures (e.g., total sales or total cost). OLAP operations then allow viewing of this data from a number of perspectives. For analysis, the object or data structure of primary interest in OLAP is a data cube. A detailed introduction to OLAP is presented in (Han & Kambler, 2006).

Design and implementation of a scalable parallel system for multidimensional analysis and OLAP

Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999, 1999

uses summary information that requires aggregate operations along one or more dimensions of numerical data values. Query processing for these applications require different views of data for decision support. The Data Cube operator provides multi-dimensional aggregates, used to calculate and store summary information on a number of dimensions. The multi-dimensionality of the underlying problem can be represented both in relational and multi-dimensional databases, the latter being a better fit when query performance is the criteria for judgment. Relational databases are scalable in size and efforts are on to make their performance acceptable. On the other hand multi-dimensional databases perform well for such queries, although they are not very scalable. Parallel computing is necessary to address the scalability and performance issues for these data sets. In this paper we present a parallel and scalable infrastructure for OLAP and multidimensional analysis. We use chunking to store data either as a dense block using multidimensional arrays (md-arrays) or a sparse set using a Bit encoded sparse structure (BESS). Chunks provide a multidimensional index structure for efficient dimension oriented data accesses much the same as md-arrays do. Operations within chunks and between chunks are a combination of relational and multi-dimensional operations depending on whether the chunk is sparse or dense. We present performance results for data sets with 3, 5 and 10 dimensions for our implementation on the IBM SP-2 which show good speedup and scalability.

High-Dimensional OLAP: A Minimal Cubing Approach

2004

Data cube has been playing an essential role in fast OLAP (online analytical processing) in many multi-dimensional data warehouses. However, there exist data sets in applications like bioinformatics, statistics, and text processing that are characterized by high dimensionality, e.g., over 100 dimensions, and moderate size, e.g., around 10 6 tuples. No feasible data cube can be constructed with such data sets. In this paper we will address the problem of developing an efficient algorithm to perform OLAP on such data sets. Experience tells us that although data analysis tasks may involve a high dimensional space, most OLAP operations are performed only on a small number of dimensions at a time. Based on this observation, we propose a novel method that computes a thin layer of the data cube together with associated value-list indices. This layer, while being manageable in size, will be capable of supporting flexible and fast OLAP operations in the original high dimensional space. Through experiments we will show that the method has I/O costs that scale nicely with dimensionality. Furthermore, the costs are comparable to that of accessing an existing data cube when full materialization is possible.