Domenico Sacca | University of Calabria (original) (raw)
Papers by Domenico Sacca
Dottorato di Ricerca in IngegnSeria dei Sistemi ed Informatica, XXII Ciclo, 2009Università della ... more Dottorato di Ricerca in IngegnSeria dei Sistemi ed Informatica, XXII Ciclo, 2009Università della Calabri
SEBD, 2019
The increasing complexity of new malware and the constant refinement of detection mechanisms are ... more The increasing complexity of new malware and the constant refinement of detection mechanisms are driving malware writers to rethink the malware development process. In this respect, compilers play a key role and can be used to implement evasion techniques able to defeat even the new generation of detection algorithms. In this paper we provide an overview of the endless battle between malware writers and detectors and we discuss some considerations on the benefits of using high level languages and even exotic compilers (e.g. single instruction compilers) in the process of writing malicious code.
Lecture Notes in Computer Science, 2016
Due to the increasing availability of huge amounts of data, traditional data management technique... more Due to the increasing availability of huge amounts of data, traditional data management techniques result inadequate in many real life scenarios. Furthermore, heterogeneity and high speed of this data require suitable data storage and management tools to be designed from scratch. In this paper, we describe a framework tailored for analyzing user interactions with intelligent systems while seeking for some domain specific information (e.g., choosing a good restaurant in a visited area). The framework enhances user quest for information by performing a data exchange activity (called data posting) which enriches the information sources with additional background information and knowledge derived from experiences and behavioral properties of domain experts and users.
Due to the emerging Big Data paradigm traditional data management techniques result inadequate in... more Due to the emerging Big Data paradigm traditional data management techniques result inadequate in many real life scenarios. In particular, OLAP techniques require substantial changes in order to offer useful analysis due to huge amount of data to be analyzed and their velocity and variety. In this paper, we describe an approach for dynamic Big Data searching that based on data collected by a suitable storage system, enriches data in order to guide users through data exploration in an efficient and effective way.
The pervasive diffusion of new generation devices like smart phones and tablets along with the wi... more The pervasive diffusion of new generation devices like smart phones and tablets along with the widespread use of social networks causes the generation of massive data flows containing heterogeneous information generated at different rates and having different formats. These data are referred as Big Data and require new storage and analysis approaches to be investigated for managing them. In this paper we will describe a system for dealing with massive tourism flows that we exploited for the analysis of tourist behavior in Italy. We defined a framework that exploits a NoSQL approach for data management and map reduce for improving the analysis of the data gathered from different sources.
Due to the emerging Big Data paradigm, traditional data management techniques result inadequate i... more Due to the emerging Big Data paradigm, traditional data management techniques result inadequate in many real life scenarios. In particular, the availability of huge amounts of data pertaining to social interactions among users calls for advanced analysis strategies. Furthermore, heterogeneity and high speed of this data require suitable data storage and management tools to be designed from scratch. In this paper, we describe a framework tailored for analysing user searches when they are connected to a social network in order to quickly identify users able to spread their influence across the network. It is worth noting that, gathering information about user preferences is crucial in several scenarios like viral marketing, tourism promotion and food education.
Proceedings of the 23rd International Database Applications & Engineering Symposium on - IDEAS '19
The data posting framework introduced in [8] adapts the well-known Data Exchange techniques to th... more The data posting framework introduced in [8] adapts the well-known Data Exchange techniques to the new Big Data management and analysis challenges that can be found in real world scenarios. Although it is expressive enough, it requires the ability of using count constraints and may be difficult for a non expert user. Moreover, the data posting problem is NP-complete under the data complexity in the general case, then the use of the non-deterministic variables is performed. Indeed, identifying the conditions that guarantee polynomial-time execution in the presence of non-deterministic choices is very important for practical purposes. In this paper, we present a simplified version of data posting framework, based on the use of the smart mapping rules, that integrate the simple mapping description with some parameters, avoiding the complex specifications with count constraints. We show that the data posting problem in the new setting is NP- complete and identify the conditions under which this problem becomes polynomial even in the presence of non-deterministic choices.
2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2018
Big Data new paradigm forced both researchers and industries to rethink data management technique... more Big Data new paradigm forced both researchers and industries to rethink data management techniques which became inadequate in many context. Indeed, we deal everyday with huge amounts of collected data about user suggestions and searches. These data requires new advanced analysis strategies to be devised in order to profitably leverage these information. Moreover, due to the heterogeneous and fast changing nature of these data, we need to adopt leverage new data storage and management tools to effectively store them. In this paper, we analyze the effect of user searches and suggestions and we try to understand how much they influence users social environment. This task is crucial to perform efficient identification of thise users able to spread their influence across the network. Gathering information about user preferences is a key activity in several scenarios like tourism promotion, personalized marketing marketing, and entertainment suggestion. We show the application of our approach for a huge research project named D-ALL that stands for Data Alliance.
2018 IEEE 27th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), 2018
Big Data rise made traditional data management techniques inadequate in many real life scenarios.... more Big Data rise made traditional data management techniques inadequate in many real life scenarios. In particular, the availability of huge amounts of data pertaining to user suggestions and searches calls for advanced analysis strategies in order to profitably leverage these data. Furthermore, heterogeneity and high speed of this data require suitable data storage and management tools to be designed from scratch. In this paper, we describe our proposal for analysing the way user searches and suggestions influence their social environment in order to quickly identify users able to spread their influence across the network. It is worth noting that, gathering information about user preferences is crucial in several scenarios like viral marketing, tourism promotion and food education.
The Big Data paradigm has recently come on scene in a quite pervasive manner. Sifting through mas... more The Big Data paradigm has recently come on scene in a quite pervasive manner. Sifting through massive amounts of this kind of data, parsing them, transferring them from a source to a target database, and analyzing them to improve business decision-making processes is too complex for traditional approaches. In this respect, there have been recent proposals that enrich data while exchanging them, such as the Data Posting framework. This framework requires the ability of using domain relations and count constraints, which may be difficult to manage for non-expert users. In this paper, we propose Smart Data Posting, a framework using intuitive constructs that are automatically translated in the standard Data Posting framework. In particular, we allow the use of smart mapping rules extended with additional selection criteria and the direct use of tuple generating dependencies and equality generating dependences. We present a complexity analysis of the framework and describe the architect...
SEBD, 2019
CalcuList (Calculator with List manipulation), is an educational language for teaching functional... more CalcuList (Calculator with List manipulation), is an educational language for teaching functional programming extended with some imperative and side-effect features, which are enabled under explicit request by the programmer. As the language natively supports json objects, it may be effectively used to implement generic MapReduce, which is a popular model in distributed computing that underpins many NoSQL systems. As a list a jsons can be thought of as a dataset of a document NoSQL datastore, it turns out that CalcuList can be used as a tool for teaching advanced query algorithms for document datastores such as MongoDB and CouchDB.
Lecture Notes in Computer Science, 1986
This paper treats the problem of implementing efficiently recursive Horn Clauses queries, includi... more This paper treats the problem of implementing efficiently recursive Horn Clauses queries, including those with function symbols. In particular, the situation is studied where the initial bindings of the arguments in the recursive query goal can be used in the top-down (as in backward chaining) execution phase to improve the efficiency and, often, to guarantee the termination, of the forward chaining execution phase that implements the fixpoint computation for the recursive query. A general method is given for solving these queries; the method performs an analysis of the binding passing behavior of the query, and then reschedules the overall execution as two fixpoint computations derived as results of this analysis. The first such computation emulates the propagation of bindings in the top-down phase; the second generates the desired answer by proving the goals left unsolved during the previous step. Finally, sufficient conditions for safety are derived, to ensure that the fixpoint computations are completed in a finite number of steps.
arXiv (Cornell University), Jan 11, 2005
Histograms are used to summarize the contents of relations into a number of buckets for the estim... more Histograms are used to summarize the contents of relations into a number of buckets for the estimation of query result sizes. Several techniques (e.g., MaxDiff and V-Optimal) have been proposed in the past for determining bucket boundaries which provide accurate estimations. However, while search strategies for optimal bucket boundaries are rather sophisticated, no much attention has been paid for estimating queries inside buckets and all of the above techniques adopt naive methods for such an estimation. This paper focuses on the problem of improving the estimation inside a bucket once its boundaries have been fixed. The proposed technique is based on the addition, to each bucket, of 32-bit additional information (organized into a 4-level tree index), storing approximate cumulative frequencies at 7 internal intervals of the bucket. Both theoretical analysis and experimental results show that, among a number of alternative ways to organize the additional information, the 4-level tree index provides the best frequency estimation inside a bucket. The index is later added to two well-known histograms, MaxDiff and V-Optimal, obtaining the non-obvious result that despite the spatial cost of 4LT which reduces the number of allowed buckets once the storage space has been fixed, the original methods are strongly improved in terms of accuracy.
An extenswn of logrc programmmng, called "ordered logic programmmng", which mcludes some abstract... more An extenswn of logrc programmmng, called "ordered logic programmmng", which mcludes some abstractrons of the object-orrented paradigm, IS presented An ordered program consists of a number of modules (oblects), where each module 1s composed by a number of rules possrbly wrth negated head predicates A sort of 'isa" hterarchy can be defined among the modules m order to allow for rule mnherrtance Therefore, every module sees its own rules as local rules and the rules of the other modules to which It as connected by the "lsa" hierarchy as global rules In this way, as local rules may hide global rules, rt IS possible to deal wrth default properties and exceptions Thrs new approach represents a novel attempt to combme the logtc paradigm wrth the object-oriented one m bwwledge base system Moreover, thrs approach provuies a new ground for explarning some recent proposals of semantics for classical logic programs wrth negation m the rule bodies and gives an mterestmng semantics to logrc programs with negated rule heads
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, Jan 18, 2022
The development of platforms and techniques for emerging Big Data and Machine Learning applicatio... more The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real‐life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two‐step approach: first, a real dataset is analyzed to derive relevant patterns and, then, to use such patterns for reconstructing a new dataset that preserves the main characteristics of . This survey explores two possible approaches: (1) Constraint‐based generation and (2) probabilistic generative modeling. The former is devised using inverse mining () techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling () are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons.This article is categorized under: Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Algorithmic Development > Structure Discovery
Tesi di Dottorato di Ricerca in Ingegneria dei Sistemi e Informatica XIX CicloUniversità della Ca... more Tesi di Dottorato di Ricerca in Ingegneria dei Sistemi e Informatica XIX CicloUniversità della Calabri
Dottorato di Ricerca in IngegnSeria dei Sistemi ed Informatica, XXII Ciclo, 2009Università della ... more Dottorato di Ricerca in IngegnSeria dei Sistemi ed Informatica, XXII Ciclo, 2009Università della Calabri
SEBD, 2019
The increasing complexity of new malware and the constant refinement of detection mechanisms are ... more The increasing complexity of new malware and the constant refinement of detection mechanisms are driving malware writers to rethink the malware development process. In this respect, compilers play a key role and can be used to implement evasion techniques able to defeat even the new generation of detection algorithms. In this paper we provide an overview of the endless battle between malware writers and detectors and we discuss some considerations on the benefits of using high level languages and even exotic compilers (e.g. single instruction compilers) in the process of writing malicious code.
Lecture Notes in Computer Science, 2016
Due to the increasing availability of huge amounts of data, traditional data management technique... more Due to the increasing availability of huge amounts of data, traditional data management techniques result inadequate in many real life scenarios. Furthermore, heterogeneity and high speed of this data require suitable data storage and management tools to be designed from scratch. In this paper, we describe a framework tailored for analyzing user interactions with intelligent systems while seeking for some domain specific information (e.g., choosing a good restaurant in a visited area). The framework enhances user quest for information by performing a data exchange activity (called data posting) which enriches the information sources with additional background information and knowledge derived from experiences and behavioral properties of domain experts and users.
Due to the emerging Big Data paradigm traditional data management techniques result inadequate in... more Due to the emerging Big Data paradigm traditional data management techniques result inadequate in many real life scenarios. In particular, OLAP techniques require substantial changes in order to offer useful analysis due to huge amount of data to be analyzed and their velocity and variety. In this paper, we describe an approach for dynamic Big Data searching that based on data collected by a suitable storage system, enriches data in order to guide users through data exploration in an efficient and effective way.
The pervasive diffusion of new generation devices like smart phones and tablets along with the wi... more The pervasive diffusion of new generation devices like smart phones and tablets along with the widespread use of social networks causes the generation of massive data flows containing heterogeneous information generated at different rates and having different formats. These data are referred as Big Data and require new storage and analysis approaches to be investigated for managing them. In this paper we will describe a system for dealing with massive tourism flows that we exploited for the analysis of tourist behavior in Italy. We defined a framework that exploits a NoSQL approach for data management and map reduce for improving the analysis of the data gathered from different sources.
Due to the emerging Big Data paradigm, traditional data management techniques result inadequate i... more Due to the emerging Big Data paradigm, traditional data management techniques result inadequate in many real life scenarios. In particular, the availability of huge amounts of data pertaining to social interactions among users calls for advanced analysis strategies. Furthermore, heterogeneity and high speed of this data require suitable data storage and management tools to be designed from scratch. In this paper, we describe a framework tailored for analysing user searches when they are connected to a social network in order to quickly identify users able to spread their influence across the network. It is worth noting that, gathering information about user preferences is crucial in several scenarios like viral marketing, tourism promotion and food education.
Proceedings of the 23rd International Database Applications & Engineering Symposium on - IDEAS '19
The data posting framework introduced in [8] adapts the well-known Data Exchange techniques to th... more The data posting framework introduced in [8] adapts the well-known Data Exchange techniques to the new Big Data management and analysis challenges that can be found in real world scenarios. Although it is expressive enough, it requires the ability of using count constraints and may be difficult for a non expert user. Moreover, the data posting problem is NP-complete under the data complexity in the general case, then the use of the non-deterministic variables is performed. Indeed, identifying the conditions that guarantee polynomial-time execution in the presence of non-deterministic choices is very important for practical purposes. In this paper, we present a simplified version of data posting framework, based on the use of the smart mapping rules, that integrate the simple mapping description with some parameters, avoiding the complex specifications with count constraints. We show that the data posting problem in the new setting is NP- complete and identify the conditions under which this problem becomes polynomial even in the presence of non-deterministic choices.
2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), 2018
Big Data new paradigm forced both researchers and industries to rethink data management technique... more Big Data new paradigm forced both researchers and industries to rethink data management techniques which became inadequate in many context. Indeed, we deal everyday with huge amounts of collected data about user suggestions and searches. These data requires new advanced analysis strategies to be devised in order to profitably leverage these information. Moreover, due to the heterogeneous and fast changing nature of these data, we need to adopt leverage new data storage and management tools to effectively store them. In this paper, we analyze the effect of user searches and suggestions and we try to understand how much they influence users social environment. This task is crucial to perform efficient identification of thise users able to spread their influence across the network. Gathering information about user preferences is a key activity in several scenarios like tourism promotion, personalized marketing marketing, and entertainment suggestion. We show the application of our approach for a huge research project named D-ALL that stands for Data Alliance.
2018 IEEE 27th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), 2018
Big Data rise made traditional data management techniques inadequate in many real life scenarios.... more Big Data rise made traditional data management techniques inadequate in many real life scenarios. In particular, the availability of huge amounts of data pertaining to user suggestions and searches calls for advanced analysis strategies in order to profitably leverage these data. Furthermore, heterogeneity and high speed of this data require suitable data storage and management tools to be designed from scratch. In this paper, we describe our proposal for analysing the way user searches and suggestions influence their social environment in order to quickly identify users able to spread their influence across the network. It is worth noting that, gathering information about user preferences is crucial in several scenarios like viral marketing, tourism promotion and food education.
The Big Data paradigm has recently come on scene in a quite pervasive manner. Sifting through mas... more The Big Data paradigm has recently come on scene in a quite pervasive manner. Sifting through massive amounts of this kind of data, parsing them, transferring them from a source to a target database, and analyzing them to improve business decision-making processes is too complex for traditional approaches. In this respect, there have been recent proposals that enrich data while exchanging them, such as the Data Posting framework. This framework requires the ability of using domain relations and count constraints, which may be difficult to manage for non-expert users. In this paper, we propose Smart Data Posting, a framework using intuitive constructs that are automatically translated in the standard Data Posting framework. In particular, we allow the use of smart mapping rules extended with additional selection criteria and the direct use of tuple generating dependencies and equality generating dependences. We present a complexity analysis of the framework and describe the architect...
SEBD, 2019
CalcuList (Calculator with List manipulation), is an educational language for teaching functional... more CalcuList (Calculator with List manipulation), is an educational language for teaching functional programming extended with some imperative and side-effect features, which are enabled under explicit request by the programmer. As the language natively supports json objects, it may be effectively used to implement generic MapReduce, which is a popular model in distributed computing that underpins many NoSQL systems. As a list a jsons can be thought of as a dataset of a document NoSQL datastore, it turns out that CalcuList can be used as a tool for teaching advanced query algorithms for document datastores such as MongoDB and CouchDB.
Lecture Notes in Computer Science, 1986
This paper treats the problem of implementing efficiently recursive Horn Clauses queries, includi... more This paper treats the problem of implementing efficiently recursive Horn Clauses queries, including those with function symbols. In particular, the situation is studied where the initial bindings of the arguments in the recursive query goal can be used in the top-down (as in backward chaining) execution phase to improve the efficiency and, often, to guarantee the termination, of the forward chaining execution phase that implements the fixpoint computation for the recursive query. A general method is given for solving these queries; the method performs an analysis of the binding passing behavior of the query, and then reschedules the overall execution as two fixpoint computations derived as results of this analysis. The first such computation emulates the propagation of bindings in the top-down phase; the second generates the desired answer by proving the goals left unsolved during the previous step. Finally, sufficient conditions for safety are derived, to ensure that the fixpoint computations are completed in a finite number of steps.
arXiv (Cornell University), Jan 11, 2005
Histograms are used to summarize the contents of relations into a number of buckets for the estim... more Histograms are used to summarize the contents of relations into a number of buckets for the estimation of query result sizes. Several techniques (e.g., MaxDiff and V-Optimal) have been proposed in the past for determining bucket boundaries which provide accurate estimations. However, while search strategies for optimal bucket boundaries are rather sophisticated, no much attention has been paid for estimating queries inside buckets and all of the above techniques adopt naive methods for such an estimation. This paper focuses on the problem of improving the estimation inside a bucket once its boundaries have been fixed. The proposed technique is based on the addition, to each bucket, of 32-bit additional information (organized into a 4-level tree index), storing approximate cumulative frequencies at 7 internal intervals of the bucket. Both theoretical analysis and experimental results show that, among a number of alternative ways to organize the additional information, the 4-level tree index provides the best frequency estimation inside a bucket. The index is later added to two well-known histograms, MaxDiff and V-Optimal, obtaining the non-obvious result that despite the spatial cost of 4LT which reduces the number of allowed buckets once the storage space has been fixed, the original methods are strongly improved in terms of accuracy.
An extenswn of logrc programmmng, called "ordered logic programmmng", which mcludes some abstract... more An extenswn of logrc programmmng, called "ordered logic programmmng", which mcludes some abstractrons of the object-orrented paradigm, IS presented An ordered program consists of a number of modules (oblects), where each module 1s composed by a number of rules possrbly wrth negated head predicates A sort of 'isa" hterarchy can be defined among the modules m order to allow for rule mnherrtance Therefore, every module sees its own rules as local rules and the rules of the other modules to which It as connected by the "lsa" hierarchy as global rules In this way, as local rules may hide global rules, rt IS possible to deal wrth default properties and exceptions Thrs new approach represents a novel attempt to combme the logtc paradigm wrth the object-oriented one m bwwledge base system Moreover, thrs approach provuies a new ground for explarning some recent proposals of semantics for classical logic programs wrth negation m the rule bodies and gives an mterestmng semantics to logrc programs with negated rule heads
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery, Jan 18, 2022
The development of platforms and techniques for emerging Big Data and Machine Learning applicatio... more The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real‐life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two‐step approach: first, a real dataset is analyzed to derive relevant patterns and, then, to use such patterns for reconstructing a new dataset that preserves the main characteristics of . This survey explores two possible approaches: (1) Constraint‐based generation and (2) probabilistic generative modeling. The former is devised using inverse mining () techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling () are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons.This article is categorized under: Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Algorithmic Development > Structure Discovery
Tesi di Dottorato di Ricerca in Ingegneria dei Sistemi e Informatica XIX CicloUniversità della Ca... more Tesi di Dottorato di Ricerca in Ingegneria dei Sistemi e Informatica XIX CicloUniversità della Calabri