Data Warehouse Striping: Improved Query Response Time (original) (raw)

Improve Data Warehouse Performance by Preprocessing and Avoidance of Complex Resource Intensive Calculations

International Journal of Computer Science Issues, 2012

A Data Warehouse is a computer system designed for archiving and analyzing an organization's historical data, such as sales, customers, products, salaries, or other information from day-today operations OLTP. Normally, an organization summarizes and copies information from its operational systems to the data warehouse on a regular schedule, such as daily, weekly, monthly, quarterly or annually; after that, management can perform complex queries and analysis OLAP on the information without slowing down the operational systems. Materialized views can be one best option in this regard and can be used in a number of ways. It can be used in distributed databases for replication and can also be used for efficient provision of data to a query through query rewriting. The process of data provision to queries can further be expedited if dependent child views are created on an already existing materialized view. Furthermore, these child-views are automatically created upon the creation of the base materialized view with some restrictions. This results in less-user dependent activity of creation of materialized views based on some parameters. These parameters are the number of child-materialized views and the type of the data a view contain. In this paper, a balanced approach is suggested to create sub-materialized views to answer user queries without consulting the fact table or parent materialized view that results in avoidance of resource intensive calculations and joining of multiple tables.

DWS-AQA: A Cost Effective Approach for Very Large Data Warehouses

2002

Data warehousing applications typically involve massive amounts of data that push database management technology to the limit. A scalable architecture is crucial, not only to handle very large amount of data but also to assure interactive response time to the users. Large data warehouses require a very expensive setup, typically based on high-end servers or high-performance clusters. In this paper we propose and evaluate a simple but very effective method to implement a data warehouse using the computers and workstations typically available in large organizations. The proposed approach is called data warehouse striping with approximate query answering (DWS-AQA). The goal is to use the processing and disk capacity normally available in large workstation networks to implement a data warehouse with a very reduced infrastructure cost. As the data warehouse shares computers that are also being used for other purposes, most of the times only a fraction of the computers will be able to execute the partial queries in time. However, as we show in the paper, the approximated answers estimated from partial results have a very small error for most of the plausible scenarios. Moreover, as the data warehouse facts are partitioned in a strict uniform way, it is possible to calculate tight confidence intervals for the approximated answers, providing the user with a measure of the accuracy of the query results. A set of experiments on the TPC-H benchmark database is presented to show the accuracy of DWS-AQA for a large number of scenarios.

Architecture and Performance of Data Warehouses

FUNDACION LAZARO, 2023

A Data Warehouse is characterized by a huge amount of data centralized in a single database. A Data Warehouse architecture is a method of defining the overall architecture of data communication processing and presentation that exist for end-clients. Architecture is an important part of any IT infrastructure because it helps to optimize the performance of the entire system. Query processing in centralized Data Warehouse is different from query processing in distributed Data Warehouse due to the amount of data processed at each site. A three-tier architecture provides an efficient query processing as compared to two-tier architecture because of the presence of precomputed results present in the middle tier. In this research paper, we discuss the primary types of architectures available and the query performance metrics. We are simulating the performance of Data Warehouse system in two types of architectures The result of our simulation clearly shows how the query performance of distributed Data Warehouse and three tier architecture is efficient as compared to their respective counterparts.

Data Warehouse Striping: Improved Query Response Time (original) (raw)

Related papers