Lorenzo Blanco - Academia.edu (original) (raw)
Uploads
Papers by Lorenzo Blanco
Sistemi Evoluti per Basi di Dati, 2005
Sistemi Evoluti per Basi di Dati, 2009
World Wide Web Conference Series, 2010
Conference on Advanced Information Systems Engineering, 2010
Several techniques have been developed to extract and integrate data from web sources. However, w... more Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.
Web Information and Data Management, 2008
Web Information Systems and Technologies, 2005
Extending Database Technology, 2008
Lecture Notes in Computer Science, 2010
Proceedings of the 19th international conference on World wide web - WWW '10, 2010
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality - WebQuality '11, 2011
Proceedings of the 20th international conference companion on World wide web - WWW '11, 2011
Procceedings of the 13th International Workshop on the Web and Databases - WebDB '10, 2010
A large number of web sites publish pages containing structured information about recognizable co... more A large number of web sites publish pages containing structured information about recognizable concepts, but these data are only partially used by current applications. Although such information is spread across a myriad of sources, the web scale implies a relevant redundancy. We present a domain independent system that exploits the redundancy of information to automatically extract and integrate data from
Proceeding of the 10th ACM workshop on Web information and data management - WIDM '08, 2008
Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08, 2008
Lecture Notes in Computer Science, 2012
Lecture Notes in Computer Science, 2013
ABSTRACT Several Web sites deliver a large number of pages, each publishing data about one instan... more ABSTRACT Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Despite it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. The paper describes a method,to automatically search on the Web for pages that publish data representing an instance of a certain conceptual entity. Our method,takes as input a small set of sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. We have implemented our method in a system prototype, which has been used to conduct several experiments that have produced interesting results. 2 Contents
Journal of Universal Computer Science - J.UCS, 2008
Sistemi Evoluti per Basi di Dati, 2005
Sistemi Evoluti per Basi di Dati, 2009
World Wide Web Conference Series, 2010
Conference on Advanced Information Systems Engineering, 2010
Several techniques have been developed to extract and integrate data from web sources. However, w... more Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.
Web Information and Data Management, 2008
Web Information Systems and Technologies, 2005
Extending Database Technology, 2008
Lecture Notes in Computer Science, 2010
Proceedings of the 19th international conference on World wide web - WWW '10, 2010
Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality - WebQuality '11, 2011
Proceedings of the 20th international conference companion on World wide web - WWW '11, 2011
Procceedings of the 13th International Workshop on the Web and Databases - WebDB '10, 2010
A large number of web sites publish pages containing structured information about recognizable co... more A large number of web sites publish pages containing structured information about recognizable concepts, but these data are only partially used by current applications. Although such information is spread across a myriad of sources, the web scale implies a relevant redundancy. We present a domain independent system that exploits the redundancy of information to automatically extract and integrate data from
Proceeding of the 10th ACM workshop on Web information and data management - WIDM '08, 2008
Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08, 2008
Lecture Notes in Computer Science, 2012
Lecture Notes in Computer Science, 2013
ABSTRACT Several Web sites deliver a large number of pages, each publishing data about one instan... more ABSTRACT Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Despite it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. The paper describes a method,to automatically search on the Web for pages that publish data representing an instance of a certain conceptual entity. Our method,takes as input a small set of sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. We have implemented our method in a system prototype, which has been used to conduct several experiments that have produced interesting results. 2 Contents
Journal of Universal Computer Science - J.UCS, 2008