Characterizing the uncertainty of web data (original) (raw)
Related papers
A bayesian network to structure a data quality model for web portals
2006
The technological advances and the use of the internet have favoured the appearance of a great diversity of web applications, among them Web portals. Through them, organizations develop their businesses in a highly competitive environment. One decisive factor for this competitiveness is the assurance of its data quality. In previous works, a data quality model for Web portals has been developed. The model is represented as a matrix that links the user expectations of data web quality to the portal functionalities. Into this matrix a set of 34 attributes where classified. However, the quality attributes on this model have not an operational structure, necessary to be used actual assessment. In this paper we present how we have structured these attributes by means of a probabilistic approach, using Bayesian Networks. The final objective is to use the Bayesian network obtained for evaluating the quality of a data portal (or a subset of its characteristics).
2014
There are many available methods to integrate information source reliability in an uncertainty representation, but there are only a few works focusing on the problem of evaluating this reliability. However, data reliability and confidence are essential components of a data warehousing system, as they influence subsequent retrieval and analysis. In this paper, we propose a generic method to assess data reliability from a set of criteria using the theory of belief functions. Customizable criteria and insightful decisions are provided. The chosen illustrative example comes from real-world data issued from the Sym’Previus predictive microbiology oriented data warehouse.
Some Aspects of the Reliability of Information on the Web
J. Univers. Comput. Sci., 2014
When we look up information in the WWW we hope to find information that is correct, fitting in quantity for our purposes and written at a level that we can understand. Unfortunately, very often one of the above criteria will not be met. A young person looking for information on some aspect of physics may well be frustrated when finding a complex formula whose understanding requires higher mathematics. In other cases, information may be much too voluminous or too short. This seems to indicate that what we need is presentation of material at various levels of detail and complexity. But most important of all, and this is what we are going to discuss in this paper is: how do we know that what we read is actually true? We will analyse this problem in the introductory section. We will show that it is impossible to expect "too much". We will argue that some improvements can be made, particularly if the domain is restricted. We will then examine certain types of geographical infor...