Towards the Development of Large-Scale Data Warehouse Application Frameworks (original) (raw)
Related papers
An Integrated Approach to Deploy Data Warehouse in Business Intelligence Environment
Business Intelligence (BI) provides historical, current and predictive views of business operations with the help of some technologies, that include reporting, online analytical processing, analytics, data mining, process mining, complex event processing, business performance management, benchmarking, text mining, predictive analytics and prescriptive analytics. As analytics plays a major role in BI, OLAP is an integrated part of BI in modern day business application. Data warehouse is the most popular way to design and build OLAP. Data warehouse along with ETL and reporting tools provides an integrated environment for business processing. Business processing also demands decision making system and knowledge representation. Moreover the data sources are physically distributed in different locations. Hence modern day business environment is a complex architecture with a numbers of entities. In this paper authors present an integrated architecture to manage and design business intelligence environment by coordinating several associated entities to achieve business agility.
Data
The extract, transform, and load (ETL) process is at the core of data warehousing architectures. As such, the success of data warehouse (DW) projects is essentially based on the proper modeling of the ETL process. As there is no standard model for the representation and design of this process, several researchers have made efforts to propose modeling methods based on different formalisms, such as unified modeling language (UML), ontology, model-driven architecture (MDA), model-driven development (MDD), and graphical flow, which includes business process model notation (BPMN), colored Petri nets (CPN), Yet Another Workflow Language (YAWL), CommonCube, entity modeling diagram (EMD), and so on. With the emergence of Big Data, despite the multitude of relevant approaches proposed for modeling the ETL process in classical environments, part of the community has been motivated to provide new data warehousing methods that support Big Data specifications. In this paper, we present a summary...
Data Warehousing Methodologies
Communications of the ACM, 2005
have experienced explosive growth in the last few years, and data warehousing has played a major role in the integration process. A data warehouse is a subjectoriented, integrated, time-variant, and nonvolatile collection of data that supports managerial decision making . Data warehousing has been cited as the highest-priority post-millennium project of more than half of IT executives. A large number of data warehousing methodologies and tools are available to support the growing market. However, with so many methodologies to choose from, a major concern for many firms is which one to employ in a given data warehousing project. In this article, we review and compare several prominent data warehousing methodologies based on a common set of attributes.
2011
Enterprise or group data warehouses are often introduced in complex multinational organizations in order to foster harmonization, integrate heterogeneous source systems and hide the heterogeneity from analytical systems. Industry reference data warehouse logical data models such as Teradata's FS-LDM or IBM's BDW are promoted as accelerators for the development of such large data warehouses. However, this paper shows that logical data models alone are not sufficient to ensure reusability in those environments. In order to provide a solid basis for standardization, the logical data model needs to be accompanied by a semantic business information model used as an anchor point for the mappings and for communication with business users. Such a model allows a model-driven approach for specification of the data transformations, which usually accounts for at least half the total effort of large data warehouse projects. The paper presents an approach building upon the Teradata Business Data Element (BDE) concept giving practical examples from project experience in the financial services industry. A research prototype is presented, utilizing Semantic Web technologies such as the Web Ontology Language (OWL) to facilitate the traceability of data requirements, business terms and physical data elements in the different layers of a complex data warehouse architecture.
Data Warehousing as Knowledge Pool : A Vital Component of Business Intelligence
International Journal of Computer Science, Engineering and Information Technology
Increasing amounts of information and diverse formats have forced organizations to create large data repositories in response to the information explosion in the 21st century. As a result, the model of a data warehouse has been introduced to define a large data repository. The purpose of this article is to describe the principles of data warehousing in business and how it can enhance the generation of new knowledge throughout the organization. Definitions of data warehousing are considered and include methods of its use, namely query and data simulation. The steps required prior to transforming the raw data into a data warehouse and project goals for data warehouses and metadata are also discussed. The objective of this document is to provide a clear and simple description of data warehousing terms and concepts, especially for busy managers and laymen. They may only need basic and direct information about data warehouses to gain a complete understanding of the principles of the data...
Semantic Data Lineage and Impact Analysis of Data Warehouse Workflows
PhD Thesis, 2018
The subject of the thesis is data flow in data warehouses. Data warehousing is a complex process of collecting data, cleansing and transforming it into information and knowledge to support strategic and tactical business decisions in organizations Our goal is to develop a new way to automatically solve a significant class of existing management and analysis problems in a corporate data warehouse environment. We will present and validate a method and an underlying set of languages, data structures and algorithms to calculate, categorize and visualize component dependencies, data lineage and business semantics from the database structure and a large set of associated procedures and queries, independently of actual data in the data warehouse. Our approach taken is based on scanning, mapping, modelling and analysing metadata of existing systems without accessing the contents of the database or impacting the behaviour of the data processing system. This requires collecting metadata from structures, queries, programs and reports from the existing environments. We have designed a domain-specific language XDTL for specifying data transformations between different data formats, locations and storage mechanisms. XDTL scripts guide the work of database schema and query scanners. We will present a flexible and dynamic database structure to store various metadata sources and implement a web-based analytical application stack for the delivery and visualization of analysis tools for various user groups with different needs. The core of the designed method relies on semantic techniques, probabilistic weight calculation and estimation of the impact of data in queries. We develop a method to estimate the impact factor of input variables in SQL statements. We will present a rule system supporting the efficient calculation of the query dependencies using these estimates. We will show how to use the results of the conducted analysis to categorize, aggregate and visualize the dependencies to address various planning and decision support problems. The methods and algorithms presented in the thesis have been implemented and tested in different data warehouse analysis and visualization tasks for tens of large international organizations. Some of these systems contain over a hundred thousand database objects and over a million ETL objects, producing data lineage graphs with more than a hundred thousand nodes. The analysis of the system performance over real-life datasets of various sizes and structures presented in the last chapter demonstrates linear performance scaling and the practical capacity to handle very large datasets.
Commercial and Open Source Business Intelligence Platforms for Big Data Warehousing
Emerging Perspectives in Big Data Warehousing
A big data warehouse enables the analysis of large amounts of information that typically comes from the organization's transactional systems (OLTP). However, today's data warehouse systems do not have the capacity to handle the massive amount of data that is currently produced. Business intelligence (BI) is a collection of decision support technologies that enable executives, managers, and analysts to make better and faster decisions. Organizations must make good use of business intelligence platforms to quickly acquire desirable information from the huge volume of data to reduce the time and increase the efficiency of decision-making processes. In this chapter, the authors present a comparative analysis of commercial and open source BI tools capabilities, in order to aid organizations in the selection process of the most suitable BI platform. They also evaluated and compared six major open source BI platforms: Actuate, Jaspersoft, Jedox/Palo, Pentaho, SpagoBI, and Vanilla; ...
Data Warehouse as a Generic Approach a Review
International Journal of Innovative Research in Computer and Communication Engineering, 2015
Digitization of data resulted in the generation of massive volumes of data in less time. The heterogeneous and disperse data sources makes the scene more complicated to handle. With the advent of 21 st century enterprises realized the importance of data spread across disparate sources. Large efforts were made to integrate this data at one place for carrying out long term managerial decisions out of it. These efforts resulted in the development of data warehousing as a solution for data integration and data analytics. A number of warehousing solutions have been proposed in the last few years to analyze business data, meteorological data, clinical data, and so on. But least research has been done to develop a generic tool that can create a warehouse irrespective of enterprise and data.