What-if OLAP Queries with Changing Dimensions (original) (raw)
2008
https://doi.org/10.1109/ICDE.2008.4497547
Sign up for access to the world's latest research
checkGet notified about relevant papers
checkSave papers to use in your research
checkJoin the discussion with peers
checkTrack your impact
Abstract
In a data warehouse, real-world activities can trigger changes to dimensions and their hierarchical structure. E.g., organizations can be reorganized over time causing changes to reporting structure. Product pricing changes in select markets can result in changes to bundled options in those markets. Much of the previous work on trend analysis on data warehouses has mainly focused on efficient evaluation
Related papers
Accelerating multi dimensional queries in data warehouses
Processing or OLAP is the main vehicle for querying data warehouses. OLAP operations commonly involve the computation of multidimensional aggregates. The major bottleneck in computing these aggregates is the large volume of data that needs to be processed which in turn leads to prohibitively expensive query execution times. On the other hand, Data Analysts are primarily concerned with discerning trends in the data and thus a system that provides approximate answers in a timely fashion would suit their requirements better.
DIMENSION HIERARCHIES UPDATES IN DATA WAREHOUSES - A User-driven Approach
Proceedings of the Ninth International Conference on Enterprise Information Systems, 2007
We designed a data warehouse for the French bank LCL meeting users' needs regarding marketing operations decision. However, the nature of the work of users implies that their requirements are often changing. In this paper, we propose an original and global approach to achieve a user-driven model evolution that provides answers to personalized analysis needs. We developed a prototype called WEDrik (data Warehouse Evolution Driven by Knowledge) within the Oracle 10g DBMS and applied our approach on banking data of LCL.
Supporting dimension updates in an OLAP server
Information Systems, 2004
Commercial OLAP systems usually treat OLAP dimensions as static entities. In practice, dimension updates are often needed to adapt the warehouse to changing requirements. In earlier work, we defined a taxonomy for these dimension updates and a minimal set of operators to perform them. In this paper we present TSOLAP, an OLAP server supporting fully dynamic dimensions. TSOLAP conforms to the OLE DB for OLAP norm, so it can be used by any client application based on this norm, and can use as backend any conformant relational server. We incorporate dimension update support to MDX, Microsoft's language for OLAP, and introduce TSShow, a visualization tool for dimensions and data cubes. Finally, we present the results of a real-life case study in the application of TSOLAP to a medium-sized medical center.
What-If Analysis for Data Warehouse Evolution
2007
In this paper, we deal with the problem of performing what-if analysis for changes that occur in the schema/structure of the data warehouse sources. We abstract software modules, queries, reports and views as (sequences of) queries in SQL enriched with functions. Queries and relations are uniformly modeled as a graph that is annotated with policies for the management of evolution events. Given a change at an element of the graph, our method detects the parts of the graph that are affected by this change and indicates the way they are tuned to respond to it.
A multidimensional and multiversion structure for OLAP applications
Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP - DOLAP '02, 2002
When changes occur on data organization, conventional multidimensional structures are not adapted because dimensions are supposed to be static. In many cases, especially when time covered by the data warehouse is large, dimensions of the hypercube must be redesigned in order to integrate evolutions. We propose an approach allowing to track history but also to compare data, mapped into static structures. We define a conceptual model building a Mutiversion Fact the Temporal Multidimensional Schema and we introduce the notion of temporal modes of representation corresponding to different ways to analyze data and their evolution.
Season queries on a temporal multidimensional model for OLAP
2010
Dimensions are usually considered static in a data warehouse. However, because of changing requirements, dimension data and dimension structure can evolve. In this paper we focus on a type of dimension data change called reclassification, i.e., when a member of a level changes its parent in a higher level of a dimension. This kind of change gives rise to the notion of season, i.e., an interval during which two members of a dimension are associated with each other. In this paper we extend a formal temporal multidimensional model with the notion of season and propose query language constructs to enable season queries. A case study about soccer illustrates the application of the proposed extensions, exemplified with several season queries.
A Temporal Query Language for OLAP: Implementation and a Case Study
Lecture Notes in Computer Science, 2002
Commercial OLAP systems usually treat OLAP dimensions as static entities. In practice, dimension updates are often necessary in order to adapt the multidimensional database to changing requirements. In earlier work we proposed a temporal multidimensional model and T OLAP, a query language supporting it, accounting for dimension updates and schema evolution at a high level of abstraction. In this paper we present our implementation of the model and the query language. We show how to translate a T OLAP program to SQL, and present a real-life case study, a medical center in Buenos Aires. We apply our implementation to this case study in order to show how our approach can address problems that occur in real situations and that current non-temporal commercial systems cannot deal with. We present results on query and dimension update performance, and briefly describe a visualization tool that allows editing and running T OLAP queries, performing dimension updates, and browsing dimensions across time.
Finding an efficient rewriting of OLAP queries using materialized views in data warehouses
Decision Support Systems, 2002
OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method to rewrite a given OLAP query using various kinds of materialized views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the selection and aggregation granularities, which are derived from the lattice of dimension hierarchies. Conditions for usability of materialized views in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that can effectively utilize materialized views having different selection granularities, selection regions, and aggregation granularities together. We also propose an algorithm to find a set of materialized views that results in a rewritten query which can be executed efficiently. We show the effectiveness and performance of the algorithm experimentally.
OLAP Analysis Operators for Multi-State Data Warehouses
International Journal of Data Warehousing and Mining, 2016
Data reduction in Multidimensional Data Warehouses (MDWs) allows increasing the efficiency of analysis and facilitating decision-makers' tasks. In this paper, the authors model a MDW containing reduced data through a set of states. Each state is valid for a certain period of time; it contains only useful information according to decision-makers' needs. In order to carry out analyses in a MDW composed of multiple states, an extension of traditional OLAP analysis operators is required. In this paper, the authors define a set of OLAP operators compatible with reduced MDWs. For each operator, they propose a user-oriented definition along with an algorithmic translation. To show the feasibility and the efficiency of the proposed concepts, they implement the analysis operators in an R-OLAP framework.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (19)
- A. Balmin et al. Hypothetical Queries in an OLAP Environ- ment. VLDB (2000), pp. 220-231.
- K. Beyer and R. Ramakrishnan. Bottom-Up Computation of Sparse and Iceberg CUBEs, SIGMOD'99.
- M. Body et al. A multidimensional and multiversion struc- ture for OLAP applications. DOLAP'02, pp. 1-6.
- S. Chaudhuri and U. Dayal. An over-view of data warehous- ing and OLAP technology. ACM Sigmod Record 1997.
- Codd, E.F. et al. Providing OLAP to User-Analysts: An IT Mandate. 1993 (sponsored by Arbor Software Corporation). dev.hyperion.com/resource library/white papers/ providing olap to user analysts.pdf.
- J. Gray et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab and Sub-totals. ICDE'96, pp. 152-159.
- V. Harinarayan et al. Implementing Data Cubes Efficiently. Proceedings of ACM SIGMOD'96.
- G. Hurlbert. A survey of graph pebbling, Congressus Numer- antium 139 (1999), 41-64.
- C.A. Hurtado et al. Updating OLAP Dimensions. DOLAP'99, pp. 60-66.
- C. A. Hurtado et al. Maintaining Data Cubes under Dimen- sion Updates. ICDE 99: 346-355.
- R. Kimball and M.Ross. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling, John Wiley, 2002.
- E.Malinowski et al. A conceptual solution for representing time in data warehouse dimensions. Asia-Pacific conference on Conceptual modelling, Volume 53, pp. 45-54 (2006).
- A.O. Mendelzon and A. Vaisman. Temporal Queries in OLAP. VLDB 2000.
- J.L. Mitrpanont and S.Fugkeaw. Database theory, technology and applications (DTTA): Design and development of a mul- tiversion OLAP application. ACM SAC (2006), pp. 493-497.
- T.Morzy, R.Wrembel. On querying versions of multiversion data warehouse. DOLAP '04, pp. 92-101.
- Multidimensional Expressions: A query lan- guage for working with multidimensional data. http://msdn2.microsoft.com/en-us/library/ms145506.aspx.
- N. Pendse and R. Creeth. The OLAP Report. Business Intel- ligence, 1995.
- A. Vaisman et al. Supporting dimension updates in an OLAP server. Inf. Syst. 29(2): 165-185 (2004).
- Y. Zhao et al. An array-based algorithm for simultaneous multidimensional aggregates. SIGMOD'97, pp. 159-170.
Related papers
Analysing Slices of Data Warehouses to Detect Structural Modifications
2004
Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by aggregating data along dimensions spanned by master data. Changes to these master data is a frequent threat to the correctness of OLAP results, in particular for multi- period data analysis, trend calculations, etc. As dimension data might change in underlying data sources without notifying the data warehouse we are exploring the application of data mining techniques for detecting such changes and contribute to avoiding incorrect results of OLAP queries.
Changes of Dimension Data in Temporal Data Warehouses
2001
Time is one of the dimensions we frequently find in data warehouses allowing comparisons of data in different periods. In current multi-dimensional data warehouse technology changes of dimension data cannot be represented adequately since all dimensions are (implicitly) considered as orthogonal. We propose an extension of the multi-dimensional data model employed in data warehouses allowing to cope correctly with changes in dimension data: a temporal multi-dimensional data model allows the registration of temporal versions of dimension data. Mappings are provided to transfer data between different temporal versions of the instances of dimensions and enable the system to correctly answer queries spanning multiple periods and thus different versions of dimension data.
Automatic Detection of Structural Changes in Data Warehouses
2003
Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by aggregating data along dimensions spanned by master data. Changes to these master data is a frequent threat to the correctness of OLAP results, in particular for multi- period data analysis, trend calculations, etc. As dimension data might change in underlying data sources without notifying the data warehouse, we are exploring the application of data mining techniques for detecting such changes and contribute to avoiding incorrect results of OLAP queries.
An OLAP-Based Approach to Modeling and Querying Granular Temporal Trends
Lecture Notes in Computer Science, 2014
Data warehouses contain valuable information for decisionmaking purposes, they can be queried and visualised with Online Analytical Processing (OLAP) tools. They contain time-related information and thus representing and reasoning on temporal data is important both to guarantee the efficacy and the quality of decision-making processes, and to detect any emergency situation as soon as possible. Several proposals deal with temporal data models and query languages for data warehouses, allowing one to use different time granularities both when storing and when querying data. In this paper we focus on two aspects pertaining to temporal data in data warehouses, namely, temporal patterns and temporal granularities. We first motivate the need for discovering granular trends in an OLAP context. Then, we propose a model for analyzing granular temporal trends in time series by taking advantage of the hierarchical structure of the time dimension.
Representing Knowledge about Changes in Data Warehouse Structures
2002
Already Ovid claimed that everything is in flux and we see continuing metamorphoses. Knowledge about changes is essential for Knowledge management in particular for the correct interpretation of data stemming from different periods. Data Warehouses are increasingly deployed in public administrations to provide analytical data for decision making, for monitoring or for revisions. Changes in transaction data are recorded in data warehouses and sophisticated tools allow to analyze these data along time and other dimensions. But changes in master data and in structures cannot be represented in current data warehouse systems impeding their use in dynamic areas and leading to erroneous query results. For an example: if the definition of "unemployment rate" changes, then the figures cannot be compared to those of previous years. Trend calculations on basis of the available data is irrelevant or severely misleading.
Techniques for logical design and efficient querying of data warehouses
2002
Techniques for logical design and efficient querying of data warehouses P. CIACCIA, M. GOLFARELLI, A. MAZZITELLI, S. RIZZI, F. SCARCELLO D2.R4 2 gennaio 2002 Sommario Logical design of data warehouses (DW) encompasses the sequence of steps which, given a core workload, determine the logical scheme for the DW. A key step in logical design is view materialization. In this paper we propose an original approach to materialization in which the workload is characterized by the presence of complex queries represented by Nested Generalized Projection/Selection/Join expressions, in which sequences of aggregate operators may be applied to measures and selection predicates may be defined, at different granularities, on both dimensions and measures. Then, we propose a novel approach to estimate the cardinality of views based on a-priori information derived from the application domain. We face the problem by first computing satisfactory bounds for the cardinality, then by determining a good probabilistic estimate for it. The results we present here concern the computation of upper bounds for the cardinality of a view considering a set of cardinality constraints expressed on some other views. Finally, we deal with the problem of populating and refreshing the data warehouse, which typically involves queries spanning several tables over the reconciled schema. We present a structural method, based on the notion of hypertree decomposition, for solving these queries efficiently. Then, we extend this method in order to take into account also quantitative information on the data values. Tema Tema 2: Progettazione e interrogazione di Data Warehouse Codice D2.R4 Data 2 gennaio 2002 Tipo di prodotto Rapporto Tecnico Numero di pagine 34 Unità responsabile BO Unità coinvolte CS
The COMET Metamodel for Temporal Data Warehouses
Notes on Numerical Fluid Mechanics and Multidisciplinary Design, 2002
The Times They Are A-Changing" (B. Dylan), and with them the structures, schemas, master data, etc. of data warehouses. For the correct treatment of such changes in OLAP queries the orthogonality assumption of star schemas has to be abandoned. We propose the COMET model which allows to represent not only changes of transaction data, as usual in data warehouses, but also of schema, and structure data. The COMET model can then be used as basis of OLAP tools which are aware of structural changes and permit correct query results spanning multiple periods and thus different versions of dimension data. In this paper we present the COMET metamodel in detail with all necessary integrity constraints and show how the intervals of structural stabilities can be computed for all components of a data warehouse.
Discovering Dynamic Classification Hierarchies in OLAP Dimensions
Lecture Notes in Computer Science, 2012
The standard approach to OLAP requires measures and dimensions of a cube to be known at the design stage. Besides, dimensions are required to be non-volatile, balanced and normalized. These constraints appear too rigid for many data sets, especially semi-structured ones, such as user-generated content in social networks and other web applications. We enrich the multidimensional analysis of such data via content-driven discovery of dimensions and classification hierarchies. Discovered elements are dynamic by nature and evolve along with the underlying data set. We demonstrate the benefits of our approach by building a data warehouse for the public stream of the popular social network and microblogging service Twitter. Our approach allows to classify users by their activity, popularity, behavior as well as to organize messages by topic, impact, origin, method of generation, etc. Such capturing of the dynamic characteristic of the data adds more intelligence to the analysis and extends the limits of OLAP.
A Survey on Data Warehouse Evolution
International Journal of Database Management Systems, 2010
The data warehouse (DW) technology was developed to integrate heterogeneous information sources for analysis purposes. Information sources are more and more autonomous and they often change their content due to perpetual transactions (data changes) and may change their structure due to continual users' requirements evolving (schema changes). Handling properly all type of changes is a must. In fact, the DW which is considered as the core component of the modern decision support systems has to be update according to different type of evolution of information sources to reflect the real world subject to analysis. The goal of this paper is to propose an overview and a comparative study between different works related to the DW evolution problem.
Related topics
Cited by
Proceedings of the 2018 International Conference on Management of Data
On line analytical processing (OLAP) is an essential element of decision-support systems. OLAP tools provide insights and understanding needed for improved decision making. However, the answers to OLAP queries can be biased and lead to perplexing and incorrect insights. In this paper, we propose HypDB a system to detect, explain, and to resolve bias in decision-support queries. We give a simple definition of a biased query, which performs a set of independence tests on the data to detect bias. We propose a novel technique that gives explanations for bias, thus assisting an analyst in understanding what goes on. Additionally, we develop an automated method for rewriting a biased query into an unbiased query, which shows what the analyst intended to examine. In a thorough evaluation on several real datasets we show both the quality and the performance of our techniques, including the completely automatic discovery of the revolutionary insights from a famous 1973 discrimination case.
Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, 2020
Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials; unfortunately these are not always feasible due to ethical, legal, or cost constraints. As an alternative, methodologies for causal inference from observational data have been developed in statistical studies and social sciences. However, existing methods critically rely on restrictive assumptions such as the study population consisting of homogeneous elements that can be represented in a single flat table, where each row is referred to as a unit. In contrast, in many real-world settings, the study domain naturally consists of heterogeneous elements with complex relational structure, where the data is naturally represented in multiple related tables. In this paper, we present a formal framework for causal inference from such relational data. We propose a declarative language called CaRL for capturing causal background knowledge and assumptions, and specifying causal queries using simple Datalog-like rules. CaRL provides a foundation for inferring causality and reasoning about the effect of complex interventions in relational domains. We present an extensive experimental evaluation on real relational data to illustrate the applicability of CaRL in social sciences and healthcare.
HypeR: Hypothetical Reasoning With What-If and How-To Queries Using a Probabilistic Causal Approach
Proceedings of the 2022 International Conference on Management of Data
What-if (provisioning for an update to a database) and how-to (how to modify the database to achieve a goal) analyses provide insights to users who wish to examine hypothetical scenarios without making actual changes to a database and thereby help plan strategies in their elds. Typically, such analyses are done by testing the eect of an update in the existing database on a specic view created by a query of interest. In real-world scenarios, however, an update to a particular part of the database may aect tuples and attributes in a completely dierent part due to implicit semantic dependencies. To allow for hypothetical reasoning while accommodating such dependencies, we develop HR, a framework that supports what-if and how-to queries accounting for probabilistic dependencies among attributes captured by a probabilistic causal model. We extend the SQL syntax to include the necessary operators for expressing these hypothetical queries, dene their semantics, devise ecient algorithms and optimizations to compute their results using concepts from causality and probabilistic databases, and evaluate the eectiveness of our approach experimentally. CCS CONCEPTS • Theory of computation ! Incomplete, inconsistent, and uncertain databases; • Computing methodologies ! Causal reasoning and diagnostics.