A systematic approach to reliability assessment in integrated databases (original) (raw)

Properties of Inconsistency Measures for Databases

2021

How should we quantify the amount of inconsistency in the database, when consistency is defined in terms of a given set of integrity constraints or rules? Proper measures are important for various tasks, such as progress indication and action prioritization in cleaning systems, and reliability estimation for new datasets. To choose an appropriate inconsistency measure for a specific use case, it is important to identify the desired properties of the application and understand which of these is guaranteed or, at least, expected in practice. For example, in some use cases, the inconsistency should reduce if constraints are eliminated; in others, it should be stable and avoid jitters and jumps in reaction to small changes in the database. Building on past research on inconsistency measures for knowledge bases, we embark on a systematic investigation of important properties for inconsistency measures. We investigate a collection of basic measures that have been proposed in the past in b...

The Complexity of Database Inconsistency Measures

2020

Managing data inconsistency has been one of the major challenges in the research and practice of database management. Database inconsistency arises for different reasons and in different applications. Nowadays, many applications obtain information from imprecise sources (e.g., social networks) via imprecise procedures (e.g., natural-language processing). Inconsistency may also arise when integrating conflicting data from different sources. During the past two decades, researchers have established, developed and investigated a principled approach to managing database inconsistency via the notion of database repairs. A repair of an inconsistent database is traditionally defined as a consistent database that differs from the inconsistent one in a “minimal” way. We investigate various problems arising in the challenge of measuring how inconsistent a database is. The problem of measuring inconsistency has been studied extensively by the Knowledge Representation and Logic communities, and...

A Relaxed Approach to Integrity and Inconsistency in Databases

2006

We demonstrate that many, though not all integrity checking methods are able to tolerate inconsistency, without having been aware of it. We show that it is possible to use them to beneficial effect and without further ado, not only for preserving integrity in consistent databases, but also in databases that violate their constraints. This apparently relaxed attitude toward integrity and inconsistency stands in contrast to approaches that are much more cautious wrt the prevention, identification, removal, repair and tolerance of inconsistent data that violate integrity. We assess several well-known methods in terms of inconsistency tolerance and give examples and counter-examples thereof.

Repair-Based Degrees of Database Inconsistency: Computation and Complexity

ArXiv, 2018

We propose a generic numerical measure of the inconsistency of a database with respect to a set of integrity constraints. It is based on an abstract repair semantics. In particular, an inconsistency measure associated to cardinality-repairs is investigated in detail. More specifically, it is shown that it can be computed via answer-set programs, but sometimes its computation can be intractable in data complexity. However, polynomial-time deterministic and randomized approximations are exhibited. The behavior of this measure under small updates is analyzed, obtaining fixed-parameter tractability results. Furthermore, alternative inconsistency measures are proposed and discussed.

Detection and Resolution of Data Inconsistencies, and Data Integration using Information Quality criteria

2000

In the processes and optimization of information integration, such as query processing, query planning and hierarchical structuring of results to the user, we argue that user quality priorities, data inconsistencies and data quality differences among the participating sources have not been fully addressed. We propose the development of a Data Quality Manager (DQM) to establish communication between the process of