Data quality at a glance (original) (raw)

DIRA : A Framework of Data Integration Using Data Quality

International Journal of Data Mining & Knowledge Management Process, 2016

Data integration is the process of collecting data from different data sources and providing user with unified view of answers that meet his requirements. The quality of query answers can be improved by identifying the quality of data sources according to some quality measures and retrieving data from only significant ones. Query answers that returned from significant data sources can be ranked according to quality requirements that specified in user query and proposed queries types to return only top-k query answers. In this paper, Data integration framework called Data integration to return ranked alternatives (DIRA) will be introduced depending on data quality assessment module that will use data sources quality to choose the significant ones and ranking algorithm to return top-k query answers according to different queries types.

Detection and Resolution of Data Inconsistencies, and Data Integration using Information Quality criteria

2000

In the processes and optimization of information integration, such as query processing, query planning and hierarchical structuring of results to the user, we argue that user quality priorities, data inconsistencies and data quality differences among the participating sources have not been fully addressed. We propose the development of a Data Quality Manager (DQM) to establish communication between the process of

A framework for data quality evaluation in a data integration system

19º Simposio Brasileiro …, 2004

To solve complex user requirements the information systems need to integrate data from several, possibly autonomous data sources. One challenge in such environment is to provide the user with data meeting his requirements in terms of quality. These requirements are difficult to satisfy because of the strong heterogeneity of the sources. In this paper we address the problem of data quality evaluation in data integration systems. We present a framework which is a first attempt to formalize the evaluation of data quality. It is based on a graph model of the data integration system which allows us to define evaluation methods and demonstrate propositions in terms of graph properties. To illustrate our approach, we also present a first experiment with the data freshness quality factor and we show how the framework is used to evaluate this factor according to different scenarios.

Evaluating Data Quality for Integration of Data Sources

Lecture Notes in Business Information Processing, 2013

Data can be looked upon as a type of model (on the instance level), as illustrated e.g., in the product models in CAD and PLM-systems. In this paper we use a specialization of a general framework for assessing quality of models to be able to evaluate the combined quality of data for the purpose of investigating potential challenges when doing data integration across different sources. A practical application of the framework from assessing the potential quality of different data sources to be used together in a collaborative work environment is used for illustrating the usefulness of the framework for this purpose. An assessment of specifically relevant knowledge sources (including the characteristics of the tools used for accessing the data) has been done. This has indicated opportunities, but also challenges when trying to integrate data from different data sources typically used by people in different roles in an organization.

Anatomy of data integration

Journal of Biomedical Informatics, 2007

Producing reliable information is the ultimate goal of data processing. The ocean of data created with the advances of science and technologies calls for integration of data coming from heterogeneous sources that are diverse in their purposes, business rules, underlying models and enabling technologies. Reference models, Semantic Web, standards, ontology, and other technologies enable fast and efficient merging of heterogeneous data, while the reliability of produced information is largely defined by how well the data represent the reality. In this paper we initiate a framework for assessing the informational value of data that includes data dimensions; aligning data quality with business practices; identifying authoritative sources and integration keys; merging models; uniting updates of varying frequency and overlapping or gapped data sets.

Data Quality Is Context Dependent

Lecture Notes in Business Information Processing, 2011

We motivate, formalize and investigate the notions of data quality assessment and data quality query answering as context dependent activities. Contexts for the assessment and usage of a data source at hand are modeled as collections of external databases, that can be materialized or virtual, and mappings within the collections and with the data source at hand. In this way, the context becomes "the complement" of the data source wrt a data integration system. The proposed model allows for natural extensions, like considering data quality predicates, and even more expressive ontologies for data quality assessment. Topics. Data quality and cleansing. ⋆ Research funded by the NSERC Strategic Network on BI (BIN, ADC05) ⋆⋆ Faculty Fellow of the IBM CAS. Also affiliated to University of Concepción (Chile). ⋆⋆⋆ Also affiliated to University of Ottawa.

Coping with Data Inconsistencies in the Integration of Heterogenous Data Sources

Global Journal of Computer Science and Technology, 2023

This research examines the problem of inconsistent data when integrating information from multiple sources into a unified view. Data inconsistencies undermine the ability to provide meaningful query responses based on the integrated data. The study reviews current techniques for handling inconsistent data including domain-specific data cleaning and declarative methods that provide answers despite integrity violations. A key challenge identified is modeling data consistency and ensuring clean integrated data. Data integration systems based on a global schema must carefully map heterogeneous sources to that schema. However, dependencies in the integrated data can prevent attaining consistency due to issues like conflicting facts from different sources. The research summarizes various proposed approaches for resolving inconsistencies through data cleaning, integrity constraints, and dependency mapping techniques. However, outstanding challenges remain regarding accuracy, availability, timeliness, and other data quality restrictions of autonomous sources.

Information Quality Measurement in Data Integration Schemas

Workshop on Information Quality in Information Systems, 2007

Integrated access to distributed data is an important problem faced in many scientific and commercial applications. A data integration system provides a unified view for users to submit queries over multiple autonomous data sources. The queries are processed over a global schema that offers an integrated view of the data sources. Much work has been done on query processing and

A Hybrid Framework for Applying Semantic Integration Technologies to Improve Data Quality

This study aims to develop a new hybrid framework of semantic integration for enterprise information system in order to improve data quality to resolve the problem from scattered data sources and rapid expansions of data. The proposed framework is based on a solid background that is inspired by previous studies. Significant and seminal research articles are reviewed based on selection criteria. A critical review is conducted in order to determine a set of qualified semantic technologies that can be used to construct a hybrid semantic integration framework. The proposed framework consists of six layers and one component as follows: source layer, translation layer, XML layer, RDF layer, inference layer, application layer, and ontology component. The proposed framework faces two challenges and one conflict; these were fixed while composing the framework. The proposed framework was examined to improve data quality for four dimensions of data quality dimensions.