Lorenzo Blanco - Academia.edu (original) (raw)

Uploads

Papers by Lorenzo Blanco

Research paper thumbnail of Characterizing the uncertainty of web data: models and experiences

Research paper thumbnail of Harvesting Structurally Similar Pages

Sistemi Evoluti per Basi di Dati, 2005

Research paper thumbnail of Data Extraction and Integration from Imprecise Web Sources

Sistemi Evoluti per Basi di Dati, 2009

Research paper thumbnail of Exploiting information redundancy to wring out structured data from the web

World Wide Web Conference Series, 2010

Research paper thumbnail of Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources

Conference on Advanced Information Systems Engineering, 2010

Several techniques have been developed to extract and integrate data from web sources. However, w... more Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.

Research paper thumbnail of Supporting the automatic construction of entity aware search engines

Web Information and Data Management, 2008

Research paper thumbnail of Efficiently Locating Collections of Web Pages to Wrap

Web Information Systems and Technologies, 2005

Research paper thumbnail of Flint: Google-basing the Web

Extending Database Technology, 2008

Research paper thumbnail of Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources

Lecture Notes in Computer Science, 2010

Research paper thumbnail of Exploiting information redundancy to wring out structured data from the web

Proceedings of the 19th international conference on World wide web - WWW '10, 2010

Research paper thumbnail of Characterizing the uncertainty of web data

Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality - WebQuality '11, 2011

Research paper thumbnail of Automatically building probabilistic databases from the web

Proceedings of the 20th international conference companion on World wide web - WWW '11, 2011

Research paper thumbnail of Redundancy-driven web data extraction and integration

Procceedings of the 13th International Workshop on the Web and Databases - WebDB '10, 2010

A large number of web sites publish pages containing structured information about recognizable co... more A large number of web sites publish pages containing structured information about recognizable concepts, but these data are only partially used by current applications. Although such information is spread across a myriad of sources, the web scale implies a relevant redundancy. We present a domain independent system that exploits the redundancy of information to automatically extract and integrate data from

Research paper thumbnail of Supporting the automatic construction of entity aware search engines

Proceeding of the 10th ACM workshop on Web information and data management - WIDM '08, 2008

Research paper thumbnail of Flint

Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08, 2008

Research paper thumbnail of Web Data Reconciliation: Models and Experiences

Lecture Notes in Computer Science, 2012

Research paper thumbnail of Future Locations Prediction with Uncertain Data

Lecture Notes in Computer Science, 2013

Research paper thumbnail of Searching Entities on the Web by Sample

ABSTRACT Several Web sites deliver a large number of pages, each publishing data about one instan... more ABSTRACT Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Despite it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. The paper describes a method,to automatically search on the Web for pages that publish data representing an instance of a certain conceptual entity. Our method,takes as input a small set of sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. We have implemented our method in a system prototype, which has been used to conduct several experiments that have produced interesting results. 2 Contents

Research paper thumbnail of A Probabilistic Model to Characterize the Uncertainty of Web Data Integration: What Sources Have The Good Data?

Research paper thumbnail of Structure and Semantics of Data-IntensiveWeb Pages: An Experimental Study on their Relationships

Journal of Universal Computer Science - J.UCS, 2008

Research paper thumbnail of Characterizing the uncertainty of web data: models and experiences

Research paper thumbnail of Harvesting Structurally Similar Pages

Sistemi Evoluti per Basi di Dati, 2005

Research paper thumbnail of Data Extraction and Integration from Imprecise Web Sources

Sistemi Evoluti per Basi di Dati, 2009

Research paper thumbnail of Exploiting information redundancy to wring out structured data from the web

World Wide Web Conference Series, 2010

Research paper thumbnail of Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources

Conference on Advanced Information Systems Engineering, 2010

Several techniques have been developed to extract and integrate data from web sources. However, w... more Several techniques have been developed to extract and integrate data from web sources. However, web data are inherently imprecise and uncertain. This paper addresses the issue of characterizing the uncertainty of data extracted from a number of inaccurate sources. We develop a probabilistic model to compute a probability distribution for the extracted values, and the accuracy of the sources. Our model considers the presence of sources that copy their contents from other sources, and manages the misleading consensus produced by copiers. We extend the models previously proposed in the literature by working on several attributes at a time to better leverage all the available evidence. We also report the results of several experiments on both synthetic and real-life data to show the effectiveness of the proposed approach.

Research paper thumbnail of Supporting the automatic construction of entity aware search engines

Web Information and Data Management, 2008

Research paper thumbnail of Efficiently Locating Collections of Web Pages to Wrap

Web Information Systems and Technologies, 2005

Research paper thumbnail of Flint: Google-basing the Web

Extending Database Technology, 2008

Research paper thumbnail of Probabilistic Models to Reconcile Complex Data from Inaccurate Data Sources

Lecture Notes in Computer Science, 2010

Research paper thumbnail of Exploiting information redundancy to wring out structured data from the web

Proceedings of the 19th international conference on World wide web - WWW '10, 2010

Research paper thumbnail of Characterizing the uncertainty of web data

Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality - WebQuality '11, 2011

Research paper thumbnail of Automatically building probabilistic databases from the web

Proceedings of the 20th international conference companion on World wide web - WWW '11, 2011

Research paper thumbnail of Redundancy-driven web data extraction and integration

Procceedings of the 13th International Workshop on the Web and Databases - WebDB '10, 2010

A large number of web sites publish pages containing structured information about recognizable co... more A large number of web sites publish pages containing structured information about recognizable concepts, but these data are only partially used by current applications. Although such information is spread across a myriad of sources, the web scale implies a relevant redundancy. We present a domain independent system that exploits the redundancy of information to automatically extract and integrate data from

Research paper thumbnail of Supporting the automatic construction of entity aware search engines

Proceeding of the 10th ACM workshop on Web information and data management - WIDM '08, 2008

Research paper thumbnail of Flint

Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08, 2008

Research paper thumbnail of Web Data Reconciliation: Models and Experiences

Lecture Notes in Computer Science, 2012

Research paper thumbnail of Future Locations Prediction with Uncertain Data

Lecture Notes in Computer Science, 2013

Research paper thumbnail of Searching Entities on the Web by Sample

ABSTRACT Several Web sites deliver a large number of pages, each publishing data about one instan... more ABSTRACT Several Web sites deliver a large number of pages, each publishing data about one instance of some real world entity, such as an athlete, a stock quote, a book. Despite it is easy for a human reader to recognize these instances, current search engines are unaware of them. Technologies for the Semantic Web aim at achieving this goal; however, so far they have been of little help in this respect, as semantic publishing is very limited. The paper describes a method,to automatically search on the Web for pages that publish data representing an instance of a certain conceptual entity. Our method,takes as input a small set of sample pages: it automatically infers a description of the underlying conceptual entity and then searches the Web for other pages containing data representing the same entity. We have implemented our method in a system prototype, which has been used to conduct several experiments that have produced interesting results. 2 Contents

Research paper thumbnail of A Probabilistic Model to Characterize the Uncertainty of Web Data Integration: What Sources Have The Good Data?

Research paper thumbnail of Structure and Semantics of Data-IntensiveWeb Pages: An Experimental Study on their Relationships

Journal of Universal Computer Science - J.UCS, 2008

Log In