The Equation between Semantics and Data Quality (original) (raw)

A Hybrid Framework for Applying Semantic Integration Technologies to Improve Data Quality

This study aims to develop a new hybrid framework of semantic integration for enterprise information system in order to improve data quality to resolve the problem from scattered data sources and rapid expansions of data. The proposed framework is based on a solid background that is inspired by previous studies. Significant and seminal research articles are reviewed based on selection criteria. A critical review is conducted in order to determine a set of qualified semantic technologies that can be used to construct a hybrid semantic integration framework. The proposed framework consists of six layers and one component as follows: source layer, translation layer, XML layer, RDF layer, inference layer, application layer, and ontology component. The proposed framework faces two challenges and one conflict; these were fixed while composing the framework. The proposed framework was examined to improve data quality for four dimensions of data quality dimensions.

(Linked) Data Quality Assessment: An Ontological Approach

2021

The effective functioning of data-intensive applications usually requires that the dataset should be of high quality. The quality depends on the task they will be used for. However, it is possible to identify task-independent data quality dimensions which are solely related to data themselves and can be extracted with the help of rule mining/pattern mining. In order to assess and improve data quality, we propose an ontological approach to report data quality violated triples. Our goal is to provide data stakeholders with a set of methods and techniques to guide them in assessing and improving data quality.

Data Integration using Semantic Technology: A use case

2006 Second International Conference on Rules and Rule Markup Languages for the Semantic Web (RuleML'06), 2006

For the integration of data that resides in autonomous data sources Software AG uses ontologies. Data source ontologies describe the data sources themselves. Business ontologies provide an integrated view of the data. F-Logic rules are used to describe mappings between data objects in data source or business ontologies. Furthermore, F-Logic is used as the query language. F-Logic rules are perfectly suited to describe the mappings between objects and their properties. In a first project we integrated data that on one side resides in a support and on the other side in a customer information system.

Connecting Databases and Ontologies: A Data Quality Perspective

2019

Taking a database-theoretic perspective on the problem of mapping relational databases to ontologies, we come up with a new mapping language that is inspired by the semijoin algebra. We illustrate the user friendliness of the mapping language by examples, and prove the decidability of some important reasoning problems by embedding our mapping language into the guarded fragment of first-order logic. We argue that these reasoning problems are relevant in data quality explorations.

Data Quality—What Can an Ontological Analysis Contribute?

2008

Progress in research on data quality is slow and relevance of results for practice is low. Can an ontological analysis make significant contributions? The "road block" in data quality research seems to be an ontological one. Approaching "data quality" with an ordinary language philosophy method reveals the inherent contradiction in the concept. The ontological analysis reveals the necessity to separate the ontology (reality) proper from the epistemology (data). Data quality reveals itself when data is used, which focuses our attention on the double linkage between reality and data: (1) the observation that reflects reality into the data and (2) the decision that links the plan to the changes in reality. The analysis of the processes leading from raw observations to decisions leads to operational definitions for "fitness for use" and an effective method to assess the fitness of data for a decision. Novel is the consideration of data quality as transformation through the whole process from data collection to decision.

An Ontology Based Approach to Data Quality Initiatives Cost-Benefit Evaluation

2009

In order to achieve higher data quality targets, organizations need to identify the data quality dimensions that are affected by poor quality, assess them, and evaluate which improvement techniques are suitable to apply. Data quality literature provides methodologies that support complete data quality management by providing guidelines that organizations should contextualize and apply to their scenario. Only a few methodologies use the cost-benefit analysis as a tool to evaluate the feasibility of a data quality improvement project. In this paper, we present an ontological description of the cost-benefit analysis including the most important contributes already proposed in literature. The use of ontologies allows the knowledge improvement by means of the identification of the interdependencies between costs and benefits and enables different complex evaluations. The feasibility and usefulness of the proposed ontology-based tool has been tested by means of a real case study.

Data Quality Principles in the Semantic Web

2012 IEEE Sixth International Conference on Semantic Computing, 2012

The increasing size and availability of web data make data quality a core challenge in many applications. Principles of data quality are recognized as essential to ensure that data fit for their intended use in operations, decision-making, and planning. However, with the rise of the Semantic Web, new data quality issues appear and require deeper consideration. In this paper, we propose to extend the data quality principles to the context of Semantic Web. Based on our extensive industrial experience in data integration, we identify five main classes suited for data quality in Semantic Web. For each class, we list the principles that are involved at all stages of the data management process. Following these principles will provide a sound basis for better decision-making within organizations and will maximize longterm data integration and interoperability.

Extending contexts with ontologies for multidimensional data quality assessment

Data quality and data cleaning are context dependent activities. Starting from this observation, in previous work a context model for the assessment of the quality of a database instance was proposed. In that framework, the context takes the form of a possibly virtual database or data integration system into which a database instance under quality assessment is mapped, for additional analysis and processing, enabling quality assessment. In this work we extend contexts with dimensions, and by doing so, we make possible a multidimensional assessment of data quality assessment. Multidimensional contexts are represented as ontologies written in Datalog±. We use this language for representing dimensional constraints, and dimensional rules, and also for doing query answering based on dimensional navigation, which becomes an important auxiliary activity in the assessment of data. We show ideas and mechanisms by means of examples.

Management of Data Quality Related Problems - Exploiting Operational Knowledge

Proceedings of the 5th International Conference on Data Management Technologies and Applications, 2016

Dealing with data quality related problems is an important issue that all organizations face in realizing and sustaining data intensive advanced applications. Upon detecting these problems in datasets, data analysts often register them in issue tracking systems in order to address them later on categorically and collectively. As there is no standard format for registering these problems, data analysts often describe them in natural languages and subsequently rely on ad-hoc, non-systematic, and expensive solutions to categorize and resolve registered problems. In this contribution we present a formal description of an innovative data quality resolving architecture to semantically and dynamically map the descriptions of data quality related problems to data quality attributes. Through this mapping, we reduce complexity -as the dimensionality of data quality attributes is far smaller than that of the natural language space -and enable data analysts to directly use the methods and tools proposed in literature. Furthermore, through managing data quality related problems, our proposed architecture offers data quality management in a dynamic way based on user generated inputs. The paper reports on a proof of concept tool and its evaluation.

Data Quality Ontology: An Ontology for Imperfect Knowledge

Spatial Information Theory

Data quality and ontology are two of the dominating research topics in GIS, influencing many others. Research so far investigated them in isolation. Ontology is concerned with perfect knowledge of the world and ignores so far imperfections in our knowledge. An ontology for imperfect knowledge leads to a consistent classification of imperfections of data (i.e., data quality), and a formalizable description of the influence of data quality on decisions. If we want to deal with data quality with ontological methods, then reality and the information model stored in the GIS must be represented in the same model. This allows to use closed loops semantics to define "fitness for use" as leading to correct, executable decisions. The approach covers knowledge of physical reality as well as personal (subjective) and social constructions. It lists systematically influences leading to imperfections in data in logical succession.