Developing Data Quality Aware Applications (original) (raw)

A systematic literature review of how to introduce data quality requirements into a software product development

2010

In recent years many organizations have come to realize the importance of maintaining data with the most appropriate levels of data quality when using their information systems (IS). We therefore consider that it is essential to introduce and implement mechanisms into the organizational IS in order to ensure acceptable quality levels in its data. Only by means of these mechanisms, will users be able to trust in the data they are using for the task in hand. These mechanisms must be developed to satisfy their data quality requirements when using specific functionalities of the IS. From our point of view as software engineering researchers, both these data quality requirements and the remaining software requirements must be dealt with in an appropriate manner. Since the goal of our research is to establish means to develop those software mechanisms aimed at managing data quality in IS, we decided to begin by carrying out a survey on related methodological and technical issues to depict the current state of the field. We decided to use the systematic review technique to achieve this goal. This paper shows the principal results of the survey, along with the conclusions reached.

A Software Engineering View of Data Quality

Thirty years ago, software was not considered a concrete value. Everyone agreed on its importance, but it was not considered as a good or possession. Nowadays, software is part of the balance of an organization. Data is slowly following the same process. The information owned by an organization is an important part of its assets. Information can be used as a competitive advantage. However, data has long been underestimated by the software community. Usually, methods and techniques apply to software (including data schemata), but the data itself has often been considered as an external problem. Validation and verification techniques usually assume that data is provided by an external agent and concentrate only on software.

A proposal to consider aspects of quality in the software development

Journal on Advances in Theoretical and Applied Informatics

Users need trusting in data managed by software applications that are part of Information Systems (IS), which supposes that organizations should assuring adequate levels of quality in data that are managed in their IS. Therefore, the fact that an IS can manage data with an adequate level of quality should be a basic requirement for all organizations. In order to reach this basic requirement some aspects and elements related with data quality (DQ) should be taken in account from the earliest stages of development of software applications, i.e. “data quality by design”. Since DQ is considered a multidimensional and largely context-dependent concept, managing all specific requirements is a complex task. The main goal of this paper is to introduce a specific methodology, which is aimed to identifying and eliciting DQ requirements coming from different viewpoints of users. These specific requirements will be used as normal requirements (both functional and non-functional) during the deve...

MMPRO: A Methodology Based on ISO/IEC 15939 to Draw Up Data Quality Measurement Processes

Iq, 2008

Nowadays, data plays a key role in organizations, and management of its quality is becoming an essential activity. As part of such required management, organizations need to draw up processes for measuring the data quality (DQ) levels of their organizational units, taking into account the particularities of different scenarios, available resources, and characteristics of the data used in them. Given that there are not many works in the literature related to this objective, this paper proposes a methodology-abbreviated MMPROto develop processes for measuring DQ. MMPRO is based on ISO/IEC 15939. Despite being a standard of quality software, we believe it can be successfully applied in this context because of the similarities between software and data. The proposed methodology consists of four activities: (1) Establish and sustain the DQ measurement commitment, (2) Plan the DQ Measurement Process, (3) Perform the DQ Measurement Process, and (4) Evaluate the DQ Measurement Process. These four activities are divided into tasks. For each task, input and output products are listed, as well as a set of useful techniques and tools, many of them borrowed from the Software Engineering field.

User-Oriented Approach to Data Quality Evaluation

JUCS - Journal of Universal Computer Science, 2020

The paper proposes a new data object-driven approach to data quality evaluation. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). In accordance with Model-Driven Architecture (MDA), the data quality model is built in two steps: (1) creating a platform-independent model (PIM), and (2) converting the created PIM into a platform-specific model (PSM). The PIM comprises informal specifications of data quality. The PSM describes the implementation of a data quality model, thus making it executable, enabling data object scanning and detecting data quality defects and anomalies. The proposed approach was applied to open data sets, ...

Evaluating the effectiveness of data quality framework in software engineering

The quality of data is important in research working with data sets because poor data quality may lead to invalid results. Data sets contain measurements that are associated with metrics and entities; however, in some data sets, it is not always clear which entities have been measured and exactly which metrics have been used. This means that measurements could be misinterpreted. In this study, we develop a framework for data quality assessment that determines whether a data set has sufficient information to support the correct interpretation of data for analysis in empirical research. The framework incorporates a dataset metamodel and a quality assessment process to evaluate the data set quality. To evaluate the effectiveness of our framework, we conducted a user study. We used observations, a questionnaire and think aloud approach to provide insights into the framework through participant thought processes while applying the framework. The results of our study provide evidence that most participants successfully applied the definitions of dataset category elements and the formal definitions of data quality issues to the datasets. Further work is needed to reproduce our results with more participants, and to determine whether the data quality framework is generalizable to other types of data sets.

Data quality in information systems

Information & Management, 1980

Until recently, data quality was poor'.y understood and seldom achieved, yet it is essential to tlihe effective use of information systems. This paper discusses/ the nature and importance of data quality. The role of dataquality is placed in the life cycle framework. Many new concepts, tools and i techniques from both programming lang,uages and database management systems are presented and rhiated to data quality. In particular, the coqcept of a databrlse constraint is considered in detail. Some current limitation/s and research directions are proposed.

Data quality assessment and improvement

International Journal of Business Information Systems, 2016

Data quality has significance to companies, but is an issue that can be challenging to approach and operationalise. This study focuses on data quality from the perspective of operationalisation by analysing the practices of a company that is a world leader in its business. A model is proposed for managing data quality to enable evaluation and operationalisation. The results indicate that data quality is best ensured when organisation specific aspects are taken into account. The model acknowledges the needs of different data domains, particularly those that have master data characteristics. The proposed model can provide a starting point for operationalising data quality assessment and improvement. The consequent appreciation of data quality improves data maintenance processes, IT solutions, data quality and relevant expertise, all of which form the basis for handling the origins of products.

A Data Quality Measurement Information Model Based On ISO/IEC 15939

2007

Measurement is a key activity in DQ Management. Through DQ literature, one can discover a lot of proposals contributing somehow to the measurement of DQ issues. Looking at those proposals, it can be found out that there is a lack of unification of the nomenclature: different authors call to the same concepts in different way, or even, they do not explicitly recognize some of them. This may cause a misunderstanding of the proposed measures. The main aim of this paper is to propose a Data Quality Measurement Information Model (DQMIM) which provides a standardization of the referred terms by following ISO/IEC 15939 as a basis. This paper deals about the concepts implied in the measurement process, not about the measures themselves. In order to make operative the DQMIM, we have also designed a XML Schema which can be used to outline Data Quality Measurement Plans.