Accounting for quality in data integration systems: a completeness-aware integration approach (original) (raw)

Quality Driven Approach for Data Integration Systems

The 7th International Conference on Information Technology, 2015

By data integration systems (DIS) we mean the systems in which query answers are instantaneously mapped from a set of available data sources. The query answers may be improved by detecting the quality of the data sources and map answers from the significant ones only. The quality measures of the data in the data sources may help in determining the significant data sources for a given query. In this paper, we suggest a method to calculate and store a set of quality measures on data sources. The quality measures are, then, interactively used in selecting the most significant candidates of data sources to answer user queries. User queries may include the user preferences of quality issues. Quality-based approach becomes increasingly important in case of big number of data sources or when the user requires data with specific quality preferences.

Anatomy of data integration

Journal of Biomedical Informatics, 2007

Producing reliable information is the ultimate goal of data processing. The ocean of data created with the advances of science and technologies calls for integration of data coming from heterogeneous sources that are diverse in their purposes, business rules, underlying models and enabling technologies. Reference models, Semantic Web, standards, ontology, and other technologies enable fast and efficient merging of heterogeneous data, while the reliability of produced information is largely defined by how well the data represent the reality. In this paper we initiate a framework for assessing the informational value of data that includes data dimensions; aligning data quality with business practices; identifying authoritative sources and integration keys; merging models; uniting updates of varying frequency and overlapping or gapped data sets.

A framework for data quality evaluation in a data integration system

19º Simposio Brasileiro …, 2004

To solve complex user requirements the information systems need to integrate data from several, possibly autonomous data sources. One challenge in such environment is to provide the user with data meeting his requirements in terms of quality. These requirements are difficult to satisfy because of the strong heterogeneity of the sources. In this paper we address the problem of data quality evaluation in data integration systems. We present a framework which is a first attempt to formalize the evaluation of data quality. It is based on a graph model of the data integration system which allows us to define evaluation methods and demonstrate propositions in terms of graph properties. To illustrate our approach, we also present a first experiment with the data freshness quality factor and we show how the framework is used to evaluate this factor according to different scenarios.

Data integration under integrity constraints

Information Systems, 2004

Data integration systems provide access to a set of heterogeneous, autonomous data sources through a so-called global schema. There are basically two approaches for designing a data integration system. In the global-as-view approach, one defines the elements of the global schema as views over the sources, whereas in the local-as-view approach, one characterizes the sources as views over the global schema. It is well known that processing queries in the latter approach is similar to query answering with ...

Conceptual modeling for data integration

2009

The goal of data integration is to provide a uniform access to a set of heterogeneous data sources, freeing the user from the knowledge about where the data are, how they are stored, and how they can be accessed. One of the outcomes of the research work carried out on data integration in the last years is a clear architecture, comprising a global schema, the source schema, and the mapping between the source and the global schema.

Data integration: A logic-based perspective

2005

Abstract Data integration is the problem of combining data residing at different autonomous, heterogeneous sources and providing the client with a unified, reconciled global view of the data. We discuss dataintegration systems, taking the abstract viewpoint that the global view is an ontology expressed in a class-based formalism.

A Survey on the Evolution of Models of Data Integration

International Journal of Knowledge Based Computer Systems, 2020

From time to time there have been different models of data integration to manage and analyze data. Also with the emergence of big data, the database community has proposed newer and better solutions to manage such disparate and large data. Also, the changes in the data storage models and massive data repositories on the web have encouraged the need for novel data integration models. In this article, we try to present a case of various trends in integrating data through different models. We present a brief overview of Federated Database Systems, Data Warehouse, Mediators and new proposed Polystore Systems with the evolution of architecture, query processing, distribution, automation and data models supported within those data integration models. The similarities and differences of these models are also presented. Also, the novelty of Polystore Systems with various examples is discussed. This article also highlights the importance of such system for integrating large scale heterogeneous data.

A Technique for Information System Integration

Information Systems Technology and Its Applications, 2001

Nowadays, a central topic in database science is the need of an integrated access to large amounts of data provided by various information sources whose contents are strictly related. Often information sources have been designed independently for autonomous applications, so they may present several kinds of heterogeneity. Particularly hard to manage is the semantic heterogeneity, which is due to schema and value inconsistencies. In this paper, we focus our attention mainly on the inconsistency which arises when conflicting instances related to the same concept and possibly coming from different sources are integrated. First, we introduce an operator, called Merge Operator, which allows us to combine data coming from different sources, preserving the information contained in each of them. Then, we present a variant of this operator, the Extended Merge Operator, which associates the integrated data with some information about the process by which they have been obtained. Finally, in order to manage conflicts among integrated data, we briefly present a technique for computing consistent answers over inconsistent databases.