Paolo Serafino | University of Southampton (original) (raw)

Papers by Paolo Serafino

Research paper thumbnail of Histogram-based Compression of Massive High-Dimensional OLAP Data Cubes

Research paper thumbnail of Truthful Mechanisms without Money for Non-utilitarian Heterogeneous Facility Location

n this paper, we consider the facility location problem un- der a novel model recently proposed i... more n this paper, we consider the facility location problem un- der a novel model recently proposed in the literature, which combines the no-money constraint (i.e. the impossibility to employ monetary transfers between the mechanism and the agents) with the presence of heterogeneous facilities, i.e. fa- cilities serving different purposes. Agents thus have a signif- icantly different cost model w.r.t. the classical model with homogeneous facilities studied in literature. We initiate the study of non-utilitarian optimization functions under this novel model. In particular, we consider the case where the op- timization goal consists of minimizing the maximum connec- tion cost of the agents. In this setting, we investigate both de- terministic and randomized algorithms and derive both lower and upper bounds regarding the approximability of strate- gyproof mechanisms.

Research paper thumbnail of Computing OLAP Aggregates over Multidimensional Data Streams Efficiently (Extended Abstract)

Research paper thumbnail of ClustCube

Proceedings of the 2011 ACM Symposium on Applied Computing - SAC '11, 2011

ABSTRACT In this paper, we introduce and experimentally assess ClustCube, an innovative OLAP-base... more ABSTRACT In this paper, we introduce and experimentally assess ClustCube, an innovative OLAP-based framework for clustering and mining complex database objects extracted from distributed database settings by means of complex SQL statements involving multiple JOIN queries across (distributed) relational tables. To this end, ClustCube puts together conventional clustering techniques and well-consolidated OLAP methodologies in order to achieve higher expressive power and mining effectiveness over traditional methodologies for mining tuple-oriented information. A relevant challenge in our research is represented by the issue of efficiently computing ClustCube cubes, enriched by the respective cuboid lattices, which may represent a critical bottleneck for the proposed ClustCube framework. To face-off this drawback, we propose a collection of algorithms that implement an innovative distributive approach taking advantages from both the structured nature of complex database objects within cuboids and the distributive nature of clustering across hierarchical domains, like those defined by conventional OLAP schemas.

Research paper thumbnail of Enhanced clustering of complex database objects in the clustcube framework

Proceedings of the fifteenth international workshop on Data warehousing and OLAP - DOLAP '12, 2012

Abstract This paper significantly extends our previous research contribution [1], where we introd... more Abstract This paper significantly extends our previous research contribution [1], where we introduced the OLAP-based ClustCube framework for clustering and mining complex database objects extracted from distributed database settings. In particular, in this research we provide the following two novel contributions over [1]. First, we provide an innovative tree-based distance function over complex objects that takes into account the typical tree-like nature of these objects in distributed database settings. This novel distance is a relevant ...

Research paper thumbnail of A reachability-based theoretical framework for modeling and querying complex probabilistic graph data

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2012

Abstract Probabilistic graph data arise in a plethora of modern applications ranging from sensor ... more Abstract Probabilistic graph data arise in a plethora of modern applications ranging from sensor networks to RDF query tools and IP-network monitoring systems. This is due to the fact that probabilistic graphs are able to capture and model uncertainty and imprecision that characterize datasets populating the above-mentioned scenarios. On the basis of this amenity, a large family of proposals devoted to model and query probabilistic graph data appeared, with alternate fortune. Nevertheless, few of these approaches address a ...

Research paper thumbnail of Integrating Semantics within Compressed OLAP Views in the Hand-OLAP System

In this paper, we provide further extensions of Hand-OLAP, a Java-based distributed system for en... more In this paper, we provide further extensions of Hand-OLAP, a Java-based distributed system for enabling OLAP in mobile environments via intelligent data cube compression approaches. These extensions aim at integrating innovative semantics representation and management models within compressed OLAP views, in order to improve the data cube compression process itself, and to support an improved summarized, OLAP-like knowledge fruition from multidimensional data cubes throughout mobile devices. We complete our analytical contribution by means of an experimental evaluation of the novel semantics-based data cube compression approach on well-known benchmark data cubes, which definitely confirms to us the efficiency and the reliability of our proposed research.

Research paper thumbnail of Semantics-Aware Advanced OLAP Visualization of Multidimensional Data Cubes

International Journal of Data Warehousing and Mining, 2007

Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and ... more Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and challenging research topic, which results to be of interest for a large family of data warehouse applications relying on the management of spatio-temporal (e.g., mobile) data, scientific and statistical data, sensor network data, biological data, etc. On the other hand, the issue of visualizing multidimensional data domains has been quite neglected from the research community, since it does not belong to the well-founded conceptual-logical-physical design hierarchy inherited from relational database methodologies. Inspired from these considerations, in this article we propose an innovative advanced OLAP visualization technique that meaningfully combines (i) the so-called OLAP dimension flattening process, which allows us to extract two-dimensional OLAP views from multidimensional data cubes, and (ii) very efficient data compression techniques for such views, which allow us to generate "semantics-aware" compressed representations where data are grouped along OLAP hierarchies.

Research paper thumbnail of Heterogeneous Facility Location without Money on the Line

ECAI 2014

The study of facility location in the presence of selfinterested agents has recently emerged as t... more The study of facility location in the presence of selfinterested agents has recently emerged as the benchmark problem in the research on mechanism design without money. Here we study the related problem of heterogeneous 2-facility location, that features more realistic assumptions such as: (i) multiple heterogeneous facilities have to be located, (ii) agents' locations are common knowledge and (iii) agents bid for the set of facilities they are interested in. We study the approximation ratio of both deterministic and randomized truthful algorithms when the underlying network is a line. We devise an (n − 1)-approximate deterministic truthful mechanism and prove a constant approximation lower bound. Furthermore, we devise an optimal and truthful (in expectation) randomized algorithm.

Research paper thumbnail of Truthful Mechanisms for the Location of Different Facilities

AAMAS14

In this paper we formalize and initiate the study of heterogeneous k-facility location without mo... more In this paper we formalize and initiate the study of heterogeneous k-facility location without money, a problem akin to the classical k-facility location problem but encompassing a richer model and featuring multi-parameter agents. In particular, we consider truthful mechanisms without money for the problem in which heterogeneous (i.e. serving different purposes) facilities have to be located and agents are only interested in some of them. We study the approximation factor that can be achieved by truthful mechanisms in this setting and present some bounds which make a surprising parallel with our knowledge of truthfulness for the classical single-dimensional facility location problem.

Research paper thumbnail of Speeding up graph clustering via modular decomposition based compression

Proceedings of the 28th Annual ACM Symposium on Applied Computing , 2013

Nowadays, massive data sets of graph-like data arise in various application domains ranging from ... more Nowadays, massive data sets of graph-like data arise in various application domains ranging from bioinformatics to social networks and communication networks analysis. The abundance of such kind of data calls for innovative techniques for storing, managing and processing graph-like data. In order to fulfill these requirements, in this paper we propose: (i) a model for representing compressed weighted graphs, and (ii) an efficient and effective compression algorithm which, leveraging on modular decomposition theory, is capable of exploiting structural properties of graphs in order to obtain highly compact and accurate compressed representations. Such compressed graphs can be used in place of the original graphs in order to enhance the performance of graph clustering algorithms in all contexts where a little inaccuracy in the results is acceptable in order to gain computational efficiency. The paper is completed by an experimental study which shows the effectiveness of the proposed approach in the context of graph clustering.

Research paper thumbnail of Probabilistic pattern queries over complex probabilistic graphs

Proceedings of the 2012 Joint EDBT/ICDT …, Jan 1, 2012

This paper introduces probabilistic pattern queries over complex probabilistic graphs, a theoreti... more This paper introduces probabilistic pattern queries over complex probabilistic graphs, a theoretical graph model proposed by us recently for dealing with complex probabilistic graph data of modern applications characterized by uncertainty and imprecision. Effective algorithms implementing such queries are also provided.

Research paper thumbnail of A family of graph-theory-driven algorithms for managing complex probabilistic graph data efficiently

Proceedings of the 15th Symposium on …, Jan 1, 2011

Abstract Traditionally, a great deal of attention has been devoted to the problem of effectively ... more Abstract Traditionally, a great deal of attention has been devoted to the problem of effectively modeling and querying probabilistic graph data. State-of-the-art proposals are not prone to deal with complex probabilistic data, as they essentially introduce simple data models (eg, based on confidence intervals) and straightforward query methodologies (eg, based on the reachability property). According to our vision, these proposals need to be extended towards achieving the definition of innovative models and algorithms capable of dealing with the ...

Research paper thumbnail of ClustCube: an OLAP-based framework for clustering and mining complex database objects

Proceedings of the 2011 ACM Symposium on …, Jan 1, 2011

Abstract In this paper, we introduce and experimentally assess ClustCube, an innovative OLAP-base... more Abstract In this paper, we introduce and experimentally assess ClustCube, an innovative OLAP-based framework for clustering and mining complex database objects extracted from distributed database settings by means of complex SQL statements involving multiple JOIN queries across (distributed) relational tables. To this end, ClustCube puts together conventional clustering techniques and well-consolidated OLAP methodologies in order to achieve higher expressive power and mining effectiveness over traditional methodologies ...

Research paper thumbnail of LCS-Hist: taming massive high-dimensional data cube compression

EDBT, Jan 1, 2009

The problem of efficiently compressing massive high-dimensional data cubes still waits for effici... more The problem of efficiently compressing massive high-dimensional data cubes still waits for efficient solutions capable of overcoming well-recognized scalability limitations of state-of-the-art histogram-based techniques, which perform well on small-in-size low-dimensional data cubes, whereas their performance in both representing the input data domain and efficiently supporting approximate query answering against the generated compressed data structure decreases dramatically when data cubes grow in dimension number and size. To overcome this relevant research challenge, in this paper we propose LCS-Hist, an innovative multidimensional histogram devising a complex methodology that combines intelligent data modeling and processing techniques in order to tame the annoying problem of compressing massive highdimensional data cubes. With respect to similar histogram-based proposals, our technique introduces (i) a surprising consumption of the storage space available to house the compressed representation of the input data cube, and (ii) a superior scalability on high-dimensional data cubes. Finally, several experimental results performed against various classes of data cubes confirm the advantages of LCS-Hist, even in comparison with those given by state-of-the-art similar techniques.

Research paper thumbnail of Semantics-aware advanced OLAP visualization of multidimensional data cubes

International Journal of Data …, Jan 1, 2007

Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and ... more Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and challenging research topic, which results to be of interest for a large family of data warehouse applications relying on the management of spatio-temporal (e.g., mobile) data, scientific and statistical data, sensor network data, biological data, etc. On the other hand, the issue of visualizing multidimensional data domains has been quite neglected from the research community, since it does not belong to the well-founded conceptual-logical-physical design hierarchy inherited from relational database methodologies. Inspired from these considerations, in this article we propose an innovative advanced OLAP visualization technique that meaningfully combines (i) the so-called OLAP dimension flattening process, which allows us to extract two-dimensional OLAP views from multidimensional data cubes, and (ii) very efficient data compression techniques for such views, which allow us to generate "semantics-aware" compressed representations where data are grouped along OLAP hierarchies.

Research paper thumbnail of A Hierarchy-Driven Compression Technique for Advanced OLAP Visualization of Multidimensional Data Cubes.

Research paper thumbnail of Histogram-based Compression of Massive High-Dimensional OLAP Data Cubes

Research paper thumbnail of Truthful Mechanisms without Money for Non-utilitarian Heterogeneous Facility Location

n this paper, we consider the facility location problem un- der a novel model recently proposed i... more n this paper, we consider the facility location problem un- der a novel model recently proposed in the literature, which combines the no-money constraint (i.e. the impossibility to employ monetary transfers between the mechanism and the agents) with the presence of heterogeneous facilities, i.e. fa- cilities serving different purposes. Agents thus have a signif- icantly different cost model w.r.t. the classical model with homogeneous facilities studied in literature. We initiate the study of non-utilitarian optimization functions under this novel model. In particular, we consider the case where the op- timization goal consists of minimizing the maximum connec- tion cost of the agents. In this setting, we investigate both de- terministic and randomized algorithms and derive both lower and upper bounds regarding the approximability of strate- gyproof mechanisms.

Research paper thumbnail of Computing OLAP Aggregates over Multidimensional Data Streams Efficiently (Extended Abstract)

Research paper thumbnail of ClustCube

Proceedings of the 2011 ACM Symposium on Applied Computing - SAC '11, 2011

ABSTRACT In this paper, we introduce and experimentally assess ClustCube, an innovative OLAP-base... more ABSTRACT In this paper, we introduce and experimentally assess ClustCube, an innovative OLAP-based framework for clustering and mining complex database objects extracted from distributed database settings by means of complex SQL statements involving multiple JOIN queries across (distributed) relational tables. To this end, ClustCube puts together conventional clustering techniques and well-consolidated OLAP methodologies in order to achieve higher expressive power and mining effectiveness over traditional methodologies for mining tuple-oriented information. A relevant challenge in our research is represented by the issue of efficiently computing ClustCube cubes, enriched by the respective cuboid lattices, which may represent a critical bottleneck for the proposed ClustCube framework. To face-off this drawback, we propose a collection of algorithms that implement an innovative distributive approach taking advantages from both the structured nature of complex database objects within cuboids and the distributive nature of clustering across hierarchical domains, like those defined by conventional OLAP schemas.

Research paper thumbnail of Enhanced clustering of complex database objects in the clustcube framework

Proceedings of the fifteenth international workshop on Data warehousing and OLAP - DOLAP '12, 2012

Abstract This paper significantly extends our previous research contribution [1], where we introd... more Abstract This paper significantly extends our previous research contribution [1], where we introduced the OLAP-based ClustCube framework for clustering and mining complex database objects extracted from distributed database settings. In particular, in this research we provide the following two novel contributions over [1]. First, we provide an innovative tree-based distance function over complex objects that takes into account the typical tree-like nature of these objects in distributed database settings. This novel distance is a relevant ...

Research paper thumbnail of A reachability-based theoretical framework for modeling and querying complex probabilistic graph data

2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2012

Abstract Probabilistic graph data arise in a plethora of modern applications ranging from sensor ... more Abstract Probabilistic graph data arise in a plethora of modern applications ranging from sensor networks to RDF query tools and IP-network monitoring systems. This is due to the fact that probabilistic graphs are able to capture and model uncertainty and imprecision that characterize datasets populating the above-mentioned scenarios. On the basis of this amenity, a large family of proposals devoted to model and query probabilistic graph data appeared, with alternate fortune. Nevertheless, few of these approaches address a ...

Research paper thumbnail of Integrating Semantics within Compressed OLAP Views in the Hand-OLAP System

In this paper, we provide further extensions of Hand-OLAP, a Java-based distributed system for en... more In this paper, we provide further extensions of Hand-OLAP, a Java-based distributed system for enabling OLAP in mobile environments via intelligent data cube compression approaches. These extensions aim at integrating innovative semantics representation and management models within compressed OLAP views, in order to improve the data cube compression process itself, and to support an improved summarized, OLAP-like knowledge fruition from multidimensional data cubes throughout mobile devices. We complete our analytical contribution by means of an experimental evaluation of the novel semantics-based data cube compression approach on well-known benchmark data cubes, which definitely confirms to us the efficiency and the reliability of our proposed research.

Research paper thumbnail of Semantics-Aware Advanced OLAP Visualization of Multidimensional Data Cubes

International Journal of Data Warehousing and Mining, 2007

Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and ... more Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and challenging research topic, which results to be of interest for a large family of data warehouse applications relying on the management of spatio-temporal (e.g., mobile) data, scientific and statistical data, sensor network data, biological data, etc. On the other hand, the issue of visualizing multidimensional data domains has been quite neglected from the research community, since it does not belong to the well-founded conceptual-logical-physical design hierarchy inherited from relational database methodologies. Inspired from these considerations, in this article we propose an innovative advanced OLAP visualization technique that meaningfully combines (i) the so-called OLAP dimension flattening process, which allows us to extract two-dimensional OLAP views from multidimensional data cubes, and (ii) very efficient data compression techniques for such views, which allow us to generate "semantics-aware" compressed representations where data are grouped along OLAP hierarchies.

Research paper thumbnail of Heterogeneous Facility Location without Money on the Line

ECAI 2014

The study of facility location in the presence of selfinterested agents has recently emerged as t... more The study of facility location in the presence of selfinterested agents has recently emerged as the benchmark problem in the research on mechanism design without money. Here we study the related problem of heterogeneous 2-facility location, that features more realistic assumptions such as: (i) multiple heterogeneous facilities have to be located, (ii) agents' locations are common knowledge and (iii) agents bid for the set of facilities they are interested in. We study the approximation ratio of both deterministic and randomized truthful algorithms when the underlying network is a line. We devise an (n − 1)-approximate deterministic truthful mechanism and prove a constant approximation lower bound. Furthermore, we devise an optimal and truthful (in expectation) randomized algorithm.

Research paper thumbnail of Truthful Mechanisms for the Location of Different Facilities

AAMAS14

In this paper we formalize and initiate the study of heterogeneous k-facility location without mo... more In this paper we formalize and initiate the study of heterogeneous k-facility location without money, a problem akin to the classical k-facility location problem but encompassing a richer model and featuring multi-parameter agents. In particular, we consider truthful mechanisms without money for the problem in which heterogeneous (i.e. serving different purposes) facilities have to be located and agents are only interested in some of them. We study the approximation factor that can be achieved by truthful mechanisms in this setting and present some bounds which make a surprising parallel with our knowledge of truthfulness for the classical single-dimensional facility location problem.

Research paper thumbnail of Speeding up graph clustering via modular decomposition based compression

Proceedings of the 28th Annual ACM Symposium on Applied Computing , 2013

Nowadays, massive data sets of graph-like data arise in various application domains ranging from ... more Nowadays, massive data sets of graph-like data arise in various application domains ranging from bioinformatics to social networks and communication networks analysis. The abundance of such kind of data calls for innovative techniques for storing, managing and processing graph-like data. In order to fulfill these requirements, in this paper we propose: (i) a model for representing compressed weighted graphs, and (ii) an efficient and effective compression algorithm which, leveraging on modular decomposition theory, is capable of exploiting structural properties of graphs in order to obtain highly compact and accurate compressed representations. Such compressed graphs can be used in place of the original graphs in order to enhance the performance of graph clustering algorithms in all contexts where a little inaccuracy in the results is acceptable in order to gain computational efficiency. The paper is completed by an experimental study which shows the effectiveness of the proposed approach in the context of graph clustering.

Research paper thumbnail of Probabilistic pattern queries over complex probabilistic graphs

Proceedings of the 2012 Joint EDBT/ICDT …, Jan 1, 2012

This paper introduces probabilistic pattern queries over complex probabilistic graphs, a theoreti... more This paper introduces probabilistic pattern queries over complex probabilistic graphs, a theoretical graph model proposed by us recently for dealing with complex probabilistic graph data of modern applications characterized by uncertainty and imprecision. Effective algorithms implementing such queries are also provided.

Research paper thumbnail of A family of graph-theory-driven algorithms for managing complex probabilistic graph data efficiently

Proceedings of the 15th Symposium on …, Jan 1, 2011

Abstract Traditionally, a great deal of attention has been devoted to the problem of effectively ... more Abstract Traditionally, a great deal of attention has been devoted to the problem of effectively modeling and querying probabilistic graph data. State-of-the-art proposals are not prone to deal with complex probabilistic data, as they essentially introduce simple data models (eg, based on confidence intervals) and straightforward query methodologies (eg, based on the reachability property). According to our vision, these proposals need to be extended towards achieving the definition of innovative models and algorithms capable of dealing with the ...

Research paper thumbnail of ClustCube: an OLAP-based framework for clustering and mining complex database objects

Proceedings of the 2011 ACM Symposium on …, Jan 1, 2011

Abstract In this paper, we introduce and experimentally assess ClustCube, an innovative OLAP-base... more Abstract In this paper, we introduce and experimentally assess ClustCube, an innovative OLAP-based framework for clustering and mining complex database objects extracted from distributed database settings by means of complex SQL statements involving multiple JOIN queries across (distributed) relational tables. To this end, ClustCube puts together conventional clustering techniques and well-consolidated OLAP methodologies in order to achieve higher expressive power and mining effectiveness over traditional methodologies ...

Research paper thumbnail of LCS-Hist: taming massive high-dimensional data cube compression

EDBT, Jan 1, 2009

The problem of efficiently compressing massive high-dimensional data cubes still waits for effici... more The problem of efficiently compressing massive high-dimensional data cubes still waits for efficient solutions capable of overcoming well-recognized scalability limitations of state-of-the-art histogram-based techniques, which perform well on small-in-size low-dimensional data cubes, whereas their performance in both representing the input data domain and efficiently supporting approximate query answering against the generated compressed data structure decreases dramatically when data cubes grow in dimension number and size. To overcome this relevant research challenge, in this paper we propose LCS-Hist, an innovative multidimensional histogram devising a complex methodology that combines intelligent data modeling and processing techniques in order to tame the annoying problem of compressing massive highdimensional data cubes. With respect to similar histogram-based proposals, our technique introduces (i) a surprising consumption of the storage space available to house the compressed representation of the input data cube, and (ii) a superior scalability on high-dimensional data cubes. Finally, several experimental results performed against various classes of data cubes confirm the advantages of LCS-Hist, even in comparison with those given by state-of-the-art similar techniques.

Research paper thumbnail of Semantics-aware advanced OLAP visualization of multidimensional data cubes

International Journal of Data …, Jan 1, 2007

Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and ... more Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and challenging research topic, which results to be of interest for a large family of data warehouse applications relying on the management of spatio-temporal (e.g., mobile) data, scientific and statistical data, sensor network data, biological data, etc. On the other hand, the issue of visualizing multidimensional data domains has been quite neglected from the research community, since it does not belong to the well-founded conceptual-logical-physical design hierarchy inherited from relational database methodologies. Inspired from these considerations, in this article we propose an innovative advanced OLAP visualization technique that meaningfully combines (i) the so-called OLAP dimension flattening process, which allows us to extract two-dimensional OLAP views from multidimensional data cubes, and (ii) very efficient data compression techniques for such views, which allow us to generate "semantics-aware" compressed representations where data are grouped along OLAP hierarchies.

Research paper thumbnail of A Hierarchy-Driven Compression Technique for Advanced OLAP Visualization of Multidimensional Data Cubes.