Executable Data Quality Models (original) (raw)

Riga , Latvia Executable Data Quality Models

2018

The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description data quality models. Data quality models consists of graphical diagrams, which elements contain requirements for data object’s values and procedures for data object’s analysis. The DSL interpreter makes the data quality model executable therefore ensuring measurement and improving of data quality. The described approach can be applied: (1) to check the completeness, accuracy and consistency of accumulated data; (2) to support data migration in cases when software architecture and/or data models are changed; (3) to gather data from different data sources and to transfer them to data warehouse. © 2016 The Authors. Published by Elsevier B.V. Peer-review under responsibility of organizing committee of the scientific committee of the internat...

User-Oriented Approach to Data Quality Evaluation

JUCS - Journal of Universal Computer Science, 2020

The paper proposes a new data object-driven approach to data quality evaluation. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). In accordance with Model-Driven Architecture (MDA), the data quality model is built in two steps: (1) creating a platform-independent model (PIM), and (2) converting the created PIM into a platform-specific model (PSM). The PIM comprises informal specifications of data quality. The PSM describes the implementation of a data quality model, thus making it executable, enabling data object scanning and detecting data quality defects and anomalies. The proposed approach was applied to open data sets, ...

Domain-Specific Characteristics of Data Quality

Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, 2017

The research discusses the issue how to describe data quality and what should be taken into account when developing an universal data quality management solution. The proposed approach is to create quality specifications for each kind of data objects and to make them executable. The specification can be executed step-by-step according to business process descriptions, ensuring the gradual accumulation of data in the database and data quality checking according to the specific use case. The described approach can be applied to check the completeness, accuracy, timeliness and consistency of accumulated data.

Developing Data Quality Aware Applications

2009 Ninth International Conference on Quality Software, 2009

Inadequate levels of Data Quality (DQ) in Information Systems (IS) suppose a very important problem for organizations. In any case, they look for to assure data quality from earlier stages on information system developments. This paper proposes to incorporate mechanisms into software development methodologies, in order to integrate users DQ requirements aimed at assuring the data quality from the beginning of development. It brings a framework consisting of processes, activities and tasks, well defined, which would be incorporated in existent software development methodology, as METRICA V3; and therefore, to assure software product data quality created according to this methodology. The extension presented, is a guideline, and this can be extended and applied to other development methodologies like Unified Development Process.

A Software Engineering View of Data Quality

Thirty years ago, software was not considered a concrete value. Everyone agreed on its importance, but it was not considered as a good or possession. Nowadays, software is part of the balance of an organization. Data is slowly following the same process. The information owned by an organization is an important part of its assets. Information can be used as a competitive advantage. However, data has long been underestimated by the software community. Usually, methods and techniques apply to software (including data schemata), but the data itself has often been considered as an external problem. Validation and verification techniques usually assume that data is provided by an external agent and concentrate only on software.

Towards data quality into the data warehouse development

Commonly, DW development methodologies, paying little attention to the problem of data quality and completeness. One of the common mistakes made during the planning of a data warehousing project is to assume that data quality will be addressed during testing. In addition to the data warehouse development methodologies, we will introduce in this paper a new approach to data warehouse development. This proposal will be based on integration data quality into the whole data warehouse development phase, denoted by: integrated requirement analysis for designing data warehouse (IRADAH). This paper shows that data quality is not only an integrated part of data warehouse project, but will remain a sustained and ongoing activity

Data quality in information systems

Information & Management, 1980

Until recently, data quality was poor'.y understood and seldom achieved, yet it is essential to tlihe effective use of information systems. This paper discusses/ the nature and importance of data quality. The role of dataquality is placed in the life cycle framework. Many new concepts, tools and i techniques from both programming lang,uages and database management systems are presented and rhiated to data quality. In particular, the coqcept of a databrlse constraint is considered in detail. Some current limitation/s and research directions are proposed.

Data quality assessment and improvement

International Journal of Business Information Systems, 2016

Data quality has significance to companies, but is an issue that can be challenging to approach and operationalise. This study focuses on data quality from the perspective of operationalisation by analysing the practices of a company that is a world leader in its business. A model is proposed for managing data quality to enable evaluation and operationalisation. The results indicate that data quality is best ensured when organisation specific aspects are taken into account. The model acknowledges the needs of different data domains, particularly those that have master data characteristics. The proposed model can provide a starting point for operationalising data quality assessment and improvement. The consequent appreciation of data quality improves data maintenance processes, IT solutions, data quality and relevant expertise, all of which form the basis for handling the origins of products.

A formal definition of data quality problems

2005

The exploration of data to extract information or knowledge to support decision making is a critical success factor for an organization in today's society. However, several problems can affect data quality. These problems have a negative effect in the results extracted from data, affecting their usefulness and correctness. In this context, it is quite important to know and understand the data problems. This paper presents a taxonomy of data quality problems, organizing them by granularity levels of occurrence. A formal definition is presented for each problem included. The taxonomy provides rigorous definitions, which are information-richer than the textual definitions used in previous works. These definitions are useful to the development of a data quality tool that automatically detects the identified problems.

A process for assessing data quality

Proceedings of the 8th international workshop on Software quality - WoSQ '11, 2011

This industrial contribution describes a tool support approach to assessing the quality of relational databases. The approach combines two separate audits-an audit of the database structure as described in the schema and an audit of the database content at a given point in time. The audit of the database schema checks for design weaknesses, data rule violations and deviations from the original data model. It also measures the size, complexity and structural quality of the database. The audit of the database content compares the state of selected data attributes to identify incorrect data and checks for missing and redundant records. The purpose is to initiate a data clean-up process to ensure or restore the quality of the data.