Truth Discovery Algorithms: An Experimental Evaluation Tech.Report May 2014 http://arxiv.org/abs/1409.6428 (original) (raw)
A fundamental problem in data fusion is to determine the veracity of multi-source data in order to resolve conflicts. While previous work in truth discovery has proved to be useful in practice for specific settings, sources' behavior or data set characteristics, there has been limited systematic comparison of the competing methods in terms of efficiency, usability, and repeatability. We remedy this deficit by providing a comprehensive review of 12 state-of-the art algorithms for truth discovery. We provide reference implementations and an in-depth evaluation of the methods based on extensive experiments on synthetic and real-world data. We analyze aspects of the problem that have not been explicitly studied before, such as the impact of initialization and parameter setting, convergence, and scalability. We provide an experimental framework for extensively comparing the methods in a wide range of truth discovery scenarios where source coverage, numbers and distributions of confli...
Related papers
TD-AC: Efficient Data Partitioning based Truth Discovery
2021
This paper introduces an effective algorithm, called TD-AC, for the truth discovery problem in scenarios where data attributes are correlated by distinct levels of reliability of the sources. TDAC is built on an abstract representation of the truth in the data to automatically find an optimal partitioning of the input data using the k-means clustering technique and the silhouette measure. Such a data partitioning strategy ensures to maximize the accuracy of any base truth discovery process when executed on each partition. The intensive experiments conducted on synthetic and real datasets show that TD-AC outperforms baseline approaches with a more reasonable running time. It improves on synthetic datasets the accuracy of standard truth discovery algorithms by 1% at least and by 14% at most and also significantly when the data coverage rate is high for the other types of datasets.
An Effective and Efficient Truth Discovery Framework over Data Streams
2017
Truth discovery, a validity assessment method for conflicting data from various sources, has been widely studied in the conventional database community. However, while existing methods for static scenario involve time-consuming iterative processes, those for streams suffer from much sacrifice on accuracy due to the incremental source weight learning. In this paper, we propose a novel framework to conduct truth discovery over streams, which incorporates various iterative methods to effectively estimate the source weights, and decides the frequency of source weight computation adaptively. Specifically, we first capture the characteristics of source weight evolution, based on which a framework is modelled. Then, we define the conditions of source weight evolution for the situations with relatively small unit and cumulative errors, and construct a probabilistic model that estimates the probability of meeting these conditions. Finally, we propose a novel scheme called adaptive source rel...
D B ] 1 M ar 2 01 5 Data Fusion : Resolving Conflicts from Multiple Sources
2015
Many data management applications, such as setting up Web po rtals, managing enterprise data, managing community data, and sha ring scientific data, require integrating data from multiple sources. Each of the se sources provides a set of values and different sources can often provide conflic ti g values. To present quality data to users, it is critical to resolve conflicts and discover values that reflect the real world; this task is called data fusion. This paper describes a novel approach that finds true values from conflicting information when there are a large number of sources, among which some may copy from others. We p res nt a case study on real-world data showing that the described algorit hm can significantly improve accuracy of truth discovery and is scalable when the re are a large number of data sources.
SLFTD: A Subjective Logic Based Framework for Truth Discovery
2019
Finding truth from various conflicting candidate values provided by different data sources is called truth discovery, which is of vital importance in data integration. Several algorithms have been proposed in this area, which usually have similar procedure: iteratively inferring the truth and provider’s reliability on providing truth until converge. Therefore, an accurate provider’s reliability evaluation is essential. However, no work pays attention to “how reliable this provider continuously providing truth”. Therefore, we introduce subjective logic, which can record both (1) the provider’s reliability of generating truth, and (2) reliability of provider continuously doing so. Our proposed methods provides a better evaluation for data providers, and based on which, truth are discovered more accurately. Our framework can handle both categorical and numerical data, and can identify truth in either a generative or discriminative way. Experiments on two popular real world datasets, Bo...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.