A Software Engineering View of Data Quality (original) (raw)
Related papers
Evaluating the effectiveness of data quality framework in software engineering
The quality of data is important in research working with data sets because poor data quality may lead to invalid results. Data sets contain measurements that are associated with metrics and entities; however, in some data sets, it is not always clear which entities have been measured and exactly which metrics have been used. This means that measurements could be misinterpreted. In this study, we develop a framework for data quality assessment that determines whether a data set has sufficient information to support the correct interpretation of data for analysis in empirical research. The framework incorporates a dataset metamodel and a quality assessment process to evaluate the data set quality. To evaluate the effectiveness of our framework, we conducted a user study. We used observations, a questionnaire and think aloud approach to provide insights into the framework through participant thought processes while applying the framework. The results of our study provide evidence that most participants successfully applied the definitions of dataset category elements and the formal definitions of data quality issues to the datasets. Further work is needed to reproduce our results with more participants, and to determine whether the data quality framework is generalizable to other types of data sets.
2010
In recent years many organizations have come to realize the importance of maintaining data with the most appropriate levels of data quality when using their information systems (IS). We therefore consider that it is essential to introduce and implement mechanisms into the organizational IS in order to ensure acceptable quality levels in its data. Only by means of these mechanisms, will users be able to trust in the data they are using for the task in hand. These mechanisms must be developed to satisfy their data quality requirements when using specific functionalities of the IS. From our point of view as software engineering researchers, both these data quality requirements and the remaining software requirements must be dealt with in an appropriate manner. Since the goal of our research is to establish means to develop those software mechanisms aimed at managing data quality in IS, we decided to begin by carrying out a survey on related methodological and technical issues to depict the current state of the field. We decided to use the systematic review technique to achieve this goal. This paper shows the principal results of the survey, along with the conclusions reached.
Developing Data Quality Aware Applications
2009 Ninth International Conference on Quality Software, 2009
Inadequate levels of Data Quality (DQ) in Information Systems (IS) suppose a very important problem for organizations. In any case, they look for to assure data quality from earlier stages on information system developments. This paper proposes to incorporate mechanisms into software development methodologies, in order to integrate users DQ requirements aimed at assuring the data quality from the beginning of development. It brings a framework consisting of processes, activities and tasks, well defined, which would be incorporated in existent software development methodology, as METRICA V3; and therefore, to assure software product data quality created according to this methodology. The extension presented, is a guideline, and this can be extended and applied to other development methodologies like Unified Development Process.
Applying a Data Quality Model to Experiments in Software Engineering
Lecture Notes in Computer Science, 2014
Data collection and analysis are key artifacts in any software engineering experiment. However, these data might contain errors. We propose a Data Quality model specific to data obtained from software engineering experiments, which provides a framework for analyzing and improving these data. We apply the model to two controlled experiments, which results in the discovery of data quality problems that need to be addressed. We conclude that data quality issues have to be considered before obtaining the experimental results.
Executable Data Quality Models
Procedia Computer Science, 2017
The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description data quality models. Data quality models consists of graphical diagrams, which elements contain requirements for data object's values and procedures for data object's analysis. The DSL interpreter makes the data quality model executable therefore ensuring measurement and improving of data quality. The described approach can be applied: (1) to check the completeness, accuracy and consistency of accumulated data; (2) to support data migration in cases when software architecture and/or data models are changed; (3) to gather data from different data sources and to transfer them to data warehouse.
Data quality in information systems
Information & Management, 1980
Until recently, data quality was poor'.y understood and seldom achieved, yet it is essential to tlihe effective use of information systems. This paper discusses/ the nature and importance of data quality. The role of dataquality is placed in the life cycle framework. Many new concepts, tools and i techniques from both programming lang,uages and database management systems are presented and rhiated to data quality. In particular, the coqcept of a databrlse constraint is considered in detail. Some current limitation/s and research directions are proposed.
A proposal to consider aspects of quality in the software development
Journal on Advances in Theoretical and Applied Informatics
Users need trusting in data managed by software applications that are part of Information Systems (IS), which supposes that organizations should assuring adequate levels of quality in data that are managed in their IS. Therefore, the fact that an IS can manage data with an adequate level of quality should be a basic requirement for all organizations. In order to reach this basic requirement some aspects and elements related with data quality (DQ) should be taken in account from the earliest stages of development of software applications, i.e. “data quality by design”. Since DQ is considered a multidimensional and largely context-dependent concept, managing all specific requirements is a complex task. The main goal of this paper is to introduce a specific methodology, which is aimed to identifying and eliciting DQ requirements coming from different viewpoints of users. These specific requirements will be used as normal requirements (both functional and non-functional) during the deve...
Riga , Latvia Executable Data Quality Models
2018
The paper discusses an external solution for data quality management in information systems. In contradiction to traditional data quality assurance methods, the proposed approach provides the usage of a domain specific language (DSL) for description data quality models. Data quality models consists of graphical diagrams, which elements contain requirements for data object’s values and procedures for data object’s analysis. The DSL interpreter makes the data quality model executable therefore ensuring measurement and improving of data quality. The described approach can be applied: (1) to check the completeness, accuracy and consistency of accumulated data; (2) to support data migration in cases when software architecture and/or data models are changed; (3) to gather data from different data sources and to transfer them to data warehouse. © 2016 The Authors. Published by Elsevier B.V. Peer-review under responsibility of organizing committee of the scientific committee of the internat...
Data quality under the computer science perspective
2002
Abstract: La qualità dei dati è una tematica affrontata in ambito statistico, gestionale, informatico, insieme a molti altri settori scientifici. Il presente articolo considera il problema della definizione della qualità dei dati dal punto di vista informatico. Sono comparate alcune proposte di dimensioni (o caratteristiche) che contribuiscono alla definizione della qualità dei dati e viene introdotta una definizione “base” di tale concetto.