Luigi Arlotta - Academia.edu (original) (raw)

Luigi Arlotta

Related Authors

Rita Lima

Matt Jans

Matt Jans

Centers for Disease Control and Prevention

Francesca Perucci

Ana Jesús López-Menéndez

meryem tahri

Uploads

Papers by Luigi Arlotta

Research paper thumbnail of A paradata-driven statistical approach to improve fieldwork monitoring: the case of the Non-Profit Institutions census

Proceedings e report, 2023

Research paper thumbnail of Automatic annotation of data extracted from large web sites

Research paper thumbnail of Automatic Annotation of Data

Data extraction from web pages is performed by software modules called wrappers. Recently, some s... more Data extraction from web pages is performed by software modules called wrappers. Recently, some systems for the automatic generation of wrappers have been proposed in the literature. These systems are based on unsupervised inference techniques: taking as input a small set of sample pages, they can produce a common wrapper to extract relevant data. However, due to the automatic nature of the approach, the data extracted by these wrappers have anonymous names. In the framework of our ongoing project RoadRunner, we have developed a prototype, called Labeller, that automatically annotates data extracted by automatically generated wrappers. Although Labeller has been developed as a companion system to our wrapper generator, its underlying approach has a general validity and therefore it can be applied together with other wrapper generator systems. We have experimented the prototype over several real-life web sites obtaining encouraging results.

Research paper thumbnail of Automatic Annotation of Data Extracted From Large Web Sites

An increasing number of databases have become web accessible through HTML form-based search inter... more An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine process able, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it fromdifferent aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly e...

Research paper thumbnail of A paradata-driven statistical approach to improve fieldwork monitoring: the case of the Non-Profit Institutions census

Proceedings e report, 2023

Research paper thumbnail of Automatic annotation of data extracted from large web sites

Research paper thumbnail of Automatic Annotation of Data

Data extraction from web pages is performed by software modules called wrappers. Recently, some s... more Data extraction from web pages is performed by software modules called wrappers. Recently, some systems for the automatic generation of wrappers have been proposed in the literature. These systems are based on unsupervised inference techniques: taking as input a small set of sample pages, they can produce a common wrapper to extract relevant data. However, due to the automatic nature of the approach, the data extracted by these wrappers have anonymous names. In the framework of our ongoing project RoadRunner, we have developed a prototype, called Labeller, that automatically annotates data extracted by automatically generated wrappers. Although Labeller has been developed as a companion system to our wrapper generator, its underlying approach has a general validity and therefore it can be applied together with other wrapper generator systems. We have experimented the prototype over several real-life web sites obtaining encouraging results.

Research paper thumbnail of Automatic Annotation of Data Extracted From Large Web Sites

An increasing number of databases have become web accessible through HTML form-based search inter... more An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine process able, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it fromdifferent aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly e...

Log In