Fabrizio Angiulli | Università della Calabria (original) (raw)
Papers by Fabrizio Angiulli
Lecture Notes in Computer Science, 2013
Determining a good sets of pivots is a challenging task for metric space indexing. Several techni... more Determining a good sets of pivots is a challenging task for metric space indexing. Several techniques to select pivots from the data to be indexed have been introduced in the literature. In this paper, we propose a pivot placement strategy which exploits the natural data orientation in order to select space points which achieve a good alignment with the whole data to be indexed. Comparison with existing methods substantiates the effectiveness of the approach.
European Conference on Artificial Intelligence, 2004
Page 1. Outlier Detection using Disjunctive Logic Programming Fabrizio Angiulli 1 and Rachel Ben-... more Page 1. Outlier Detection using Disjunctive Logic Programming Fabrizio Angiulli 1 and Rachel Ben-Eliyahu - Zohary 2 and Luigi Palopoli 3 Abstract. ... 1 ICAR-CNR, Via Pietro Bucci 41C, 87030 Rende (CS), Italy, email: angiulli@icar.cnr.it 2 Comm. ...
Computing Research Repository, 2004
The development of efiective knowledge discovery techniques has become a very active research are... more The development of efiective knowledge discovery techniques has become a very active research area in recent years due to the important impact it has had in several relevant application domains. One interesting task therein is that of singling out anomalous individuals from a given population, e.g., to detect rare events in time-series analysis settings, or to identify objects whose behavior
ABSTRACT Default logic is used to describe regular behavior and normal properties. We suggest to ... more ABSTRACT Default logic is used to describe regular behavior and normal properties. We suggest to exploit the framework of default logic for detecting outliers - individuals who behave in an unexpected way or feature abnormal properties. The ability to locate outliers can help to maintain knowledgebase integrity and to single out irregular individuals. We first formally define the notion of an outlier and an outlier witness. We then show that finding outliers is quite complex. Indeed, we show that several versions of the outlier detection problem lie over the second level of the polynomial hierarchy. For example, the question of establishing if at least one outlier can be detected in a given propositional default theory is -complete. Although outlier detection involves heavy computation, the queries involved can frequently be executed off-line, thus somewhat alleviating the difficulty of the problem. In addition, we show that outlier detection can be done in polynomial time for both the class of acyclic normal unary defaults and the class of acyclic dual normal unary defaults. 1
Artificial Intelligence, Oct 1, 2010
It was noted recently that the framework of default logics can be exploited for detecting outlier... more It was noted recently that the framework of default logics can be exploited for detecting outliers. Outliers are observations expressed by sets of literals that feature unexpected properties. These observations are not explicitly provided in input (as it happens with abduction) but, rather, they are hidden in the given knowledge base. Unfortunately, in the two related formalisms for specifying defaults -Reiter's default logic and extended disjunctive logic programs -the most general outlier detection problems turn out to lie at the third level of the polynomial hierarchy. In this note, we analyze the complexity of outlier detection for two very simple classes of default theories, namely NU and DNU, for which the entailment problem is solvable in polynomial time. We show that, for these classes, checking for the existence of an outlier is anyway intractable. This result contributes to further showing the inherent intractability of outlier detection in default reasoning.
Acm Transactions on Computational Logic, 2003
ABSTRACT Metaquerying is a datamining technology by which hidden dependencies among several datab... more ABSTRACT Metaquerying is a datamining technology by which hidden dependencies among several database relations can be discovered. This tool has already been successfully applied to several real-world applications. Recent papers provide only preliminary results about the complexity of metaquerying. In this paper we define several variants of metaquerying that encompass, as far as we know, all variants defined in the literature. We study both the combined complexity and the data complexity of these variants. We show that, under the combined complexity measure, metaquerying is generally intractable (unless P=NP), lying sometimes quite high in the complexity hierarchies (as high as NP PP), depending on the characteristics of the plausibility index. However, we are able to single out some tractable and interesting metaquerying cases (whose combined complexity is LOGCFL-complete). As for the data complexity of metaquerying, we prove that, in general, this is in TC0, but lies within AC0 in some simpler cases. Finally, we discuss implementation of metaqueries, by providing algorithms to answer them.
Sebd, 2001
... Fabrizio Angiulli1, Giovambattista Ianni2, and Luigi Palopoli3 1 ISI-CNR c/o Universitá della... more ... Fabrizio Angiulli1, Giovambattista Ianni2, and Luigi Palopoli3 1 ISI-CNR c/o Universitá della Calabria, DEIS, Via P. Bucci 41C, Rende, Italy angiulli@isi.cs.cnr.it 2 Universitá della Calabria, DEIS, Via P. Bucci 41C, Rende, Italy ianni@deis.unical.it 3 Universitá di Reggio Calabria ...
Default logic is used to describe regular behavior and normal properties. We suggest to exploit t... more Default logic is used to describe regular behavior and normal properties. We suggest to exploit the framework of default logic for detecting outliers - individuals who behave in an unexpected way or feature abnormal properties. The ability to locate outliers can help to maintain knowledgebase integrity and to single out irregular individuals.
Theoretical Computer Science, 2004
Inducing association rules is one of the central tasks in data mining applications. Quantitative ... more Inducing association rules is one of the central tasks in data mining applications. Quantitative association rules induced from databases describe rich and hidden relationships to be found within data that can prove useful for various application purposes (e.g., market basket analysis, customer proÿling, and others). Although association rules are quite widely used in practice, a thorough analysis of the related computational complexity is missing. This paper intends to provide a contribution in this setting. To this end, we ÿrst formally deÿne quantitative association rule mining problems, which include boolean association rules as a special case; we then analyze computational complexity of such problems. The general problem as well as some interesting special cases are considered.
Sistemi Evoluti per Basi di Dati, 2008
Proceedings. International Database Engineering and Applications Symposium, 2004. IDEAS '04., 2004
ABSTRACT In This work we present a novel approximate algorithm to calculate the top k closest pai... more ABSTRACT In This work we present a novel approximate algorithm to calculate the top k closest pairs join query of two large and high dimensional data sets. The algorithm has worst case time complexity O(d2nk) and space complexity O(nd) and guarantees a solution within a O(d1+ 12 /) factor of the exact one, where t ∈ {1,2,..., ∞} denotes the Minkowski metrics Lt of interest and d the dimensionality. It makes use of the concept of space filling curve to establish an order between the points of the space and performs at most d + 1 sorts and scans of the two data sets. During a scan, each point from one data set is compared with its closest points, according to the space filling curve order, in the other data set and points whose contribution to the solution has already been analyzed are detected and eliminated. Experimental results on real and synthetic data sets show that our algorithm (i) behaves as an exact algorithm in low dimensional spaces; (ii) it is able to prune the entire (or a considerable fraction of the) data set even for high dimensions if certain separation conditions are satisfied; (iii) in any case it returns a solution within a small error to the exact one.
IFIP – The International Federation for Information Processing, 2008
... Fabrizio Angiulli DEIS, Universit`a della Calabria, Via P. Bucci 41C, 87036 Rende (CS), Italy... more ... Fabrizio Angiulli DEIS, Universit`a della Calabria, Via P. Bucci 41C, 87036 Rende (CS), Italy, e-mail: f.angiulli@deis.unical.it Stefano Basta ICAR-CNR, Via P. Bucci 41C, 87036 Rende (CS), Italy, e-mail: basta@icar.cnr.it Please ...
Artificial Intelligence, 2014
ABSTRACT Designing algorithms capable of efficiently constructing minimal models of CNFs is an im... more ABSTRACT Designing algorithms capable of efficiently constructing minimal models of CNFs is an important task in AI. This paper provides new results along this research line and presents new algorithms for performing minimal model finding and checking over positive propositional CNFs and model minimization over propositional CNFs. An algorithmic schema, called the Generalized Elimination Algorithm (GEA) is presented, that computes a minimal model of any positive CNF. The schema generalizes the Elimination Algorithm (EA) [BP97], which computes a minimal model of positive head-cycle-free (HCF) CNF theories. While the EA always runs in polynomial time in the size of the input HCF CNF, the complexity of the GEA depends on the complexity of the specific eliminating operator invoked therein, which may in general turn out to be exponential. Therefore, a specific eliminating operator is defined by which the GEA computes, in polynomial time, a minimal model for a class of CNF that strictly includes head-elementary-set-free (HEF) CNF theories [GLL06], which form, in their turn, a strict superset of HCF theories. Furthermore, in order to deal with the high complexity associated with recognizing HEF theories, an "incomplete" variant of the GEA (called IGEA) is proposed: the resulting schema, once instantiated with an appropriate elimination operator, always constructs a model of the input CNF, which is guaranteed to be minimal if the input theory is HEF. In the light of the above results, the main contribution of this work is the enlargement of the tractability frontier for the minimal model finding and checking and the model minimization problems.
2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06), 2006
Page 1. A Greedy Search Approach to Co-clustering Sparse Binary Matrices Fabrizio Angiulli, Eugen... more Page 1. A Greedy Search Approach to Co-clustering Sparse Binary Matrices Fabrizio Angiulli, Eugenio Cesario, Clara Pizzuti ICAR-CNR Via P. Bucci, 41C 87036 Rende (CS), Italy {angiulli,cesario,pizzuti}@icar.cnr.it Abstract ...
Lecture Notes in Computer Science, 2004
ABSTRACT A novel algorithm, named DESCRY, for clustering very large multidimensional data sets wi... more ABSTRACT A novel algorithm, named DESCRY, for clustering very large multidimensional data sets with numerical attributes is presented. DESCRY discovers clusters having different shape, size, and density and when data contains noise by first finding and clustering a small set of points, called meta-points, that well depict the shape of clusters present in the data set. Final clusters are obtained by assigning each point to one of the partial clusters. The computational complexity of DESCRY is linear both in the data set size and in the data set dimensionality. Experiments show the very good qualitative results obtained comparable with those obtained by state of the art clustering algorithms.
Theory and Practice of Object Systems, 1997
ABSTRACT This paper illustrates a prototype system, called GPRS, supporting the Generalized Produ... more ABSTRACT This paper illustrates a prototype system, called GPRS, supporting the Generalized Production Rules (GPR) database language. The GPR language integrates, in a unified framework, active rules, which allow the specification of event driven computations on data, and deductive rules, which can be used to derive intensional relations in the style of logic programming. The prototype realizes the operational semantics of GPR using a unique ruleevaluation engine. The data model of reference is object based and the system is implemented on top of an object oriented DBMS. Hence, the GPRS prototype represents a concrete proposal of an advanced DBMS for complex objects that provides both active and deductive styles of rule programming. c fl 1997 John Wiley & Sons
Lecture Notes in Computer Science, 2002
ABSTRACT An approximate algorithm to efficiently solve the k-Closest- Pairs problem in high-dimen... more ABSTRACT An approximate algorithm to efficiently solve the k-Closest- Pairs problem in high-dimensional spaces is presented. The method is based on dimensionality reduction of the space ℝd through the Hilbert space filling curve and performs at most d+1 scans of the data set. After each scan, those points whose contribution to the solution has already been analyzed, are eliminated from the data set. The pruning is lossless, in fact the remaining points along with the approximate solution found can be used for the computation of the exact solution. Although we are able to guarantee an O(d 1+ 1/t ) approximation to the solution, where t = 1,…,∞ denotes the used L t metric, experimental results give the exact k-Closest-Pairs for all the data sets considered and show that the pruning of the search space is effective.
Lecture Notes in Computer Science, 2002
In this paper we propose a new definition of distance-based outlier that considers for each point... more In this paper we propose a new definition of distance-based outlier that considers for each point the sum of the distances from its k nearest neighbors, called weight. Outliers are those points having the largest values of weight. In order to compute these weights, we find the k nearest neighbors of each point in a fast and efficient way by
Lecture Notes in Computer Science, 2013
Determining a good sets of pivots is a challenging task for metric space indexing. Several techni... more Determining a good sets of pivots is a challenging task for metric space indexing. Several techniques to select pivots from the data to be indexed have been introduced in the literature. In this paper, we propose a pivot placement strategy which exploits the natural data orientation in order to select space points which achieve a good alignment with the whole data to be indexed. Comparison with existing methods substantiates the effectiveness of the approach.
European Conference on Artificial Intelligence, 2004
Page 1. Outlier Detection using Disjunctive Logic Programming Fabrizio Angiulli 1 and Rachel Ben-... more Page 1. Outlier Detection using Disjunctive Logic Programming Fabrizio Angiulli 1 and Rachel Ben-Eliyahu - Zohary 2 and Luigi Palopoli 3 Abstract. ... 1 ICAR-CNR, Via Pietro Bucci 41C, 87030 Rende (CS), Italy, email: angiulli@icar.cnr.it 2 Comm. ...
Computing Research Repository, 2004
The development of efiective knowledge discovery techniques has become a very active research are... more The development of efiective knowledge discovery techniques has become a very active research area in recent years due to the important impact it has had in several relevant application domains. One interesting task therein is that of singling out anomalous individuals from a given population, e.g., to detect rare events in time-series analysis settings, or to identify objects whose behavior
ABSTRACT Default logic is used to describe regular behavior and normal properties. We suggest to ... more ABSTRACT Default logic is used to describe regular behavior and normal properties. We suggest to exploit the framework of default logic for detecting outliers - individuals who behave in an unexpected way or feature abnormal properties. The ability to locate outliers can help to maintain knowledgebase integrity and to single out irregular individuals. We first formally define the notion of an outlier and an outlier witness. We then show that finding outliers is quite complex. Indeed, we show that several versions of the outlier detection problem lie over the second level of the polynomial hierarchy. For example, the question of establishing if at least one outlier can be detected in a given propositional default theory is -complete. Although outlier detection involves heavy computation, the queries involved can frequently be executed off-line, thus somewhat alleviating the difficulty of the problem. In addition, we show that outlier detection can be done in polynomial time for both the class of acyclic normal unary defaults and the class of acyclic dual normal unary defaults. 1
Artificial Intelligence, Oct 1, 2010
It was noted recently that the framework of default logics can be exploited for detecting outlier... more It was noted recently that the framework of default logics can be exploited for detecting outliers. Outliers are observations expressed by sets of literals that feature unexpected properties. These observations are not explicitly provided in input (as it happens with abduction) but, rather, they are hidden in the given knowledge base. Unfortunately, in the two related formalisms for specifying defaults -Reiter's default logic and extended disjunctive logic programs -the most general outlier detection problems turn out to lie at the third level of the polynomial hierarchy. In this note, we analyze the complexity of outlier detection for two very simple classes of default theories, namely NU and DNU, for which the entailment problem is solvable in polynomial time. We show that, for these classes, checking for the existence of an outlier is anyway intractable. This result contributes to further showing the inherent intractability of outlier detection in default reasoning.
Acm Transactions on Computational Logic, 2003
ABSTRACT Metaquerying is a datamining technology by which hidden dependencies among several datab... more ABSTRACT Metaquerying is a datamining technology by which hidden dependencies among several database relations can be discovered. This tool has already been successfully applied to several real-world applications. Recent papers provide only preliminary results about the complexity of metaquerying. In this paper we define several variants of metaquerying that encompass, as far as we know, all variants defined in the literature. We study both the combined complexity and the data complexity of these variants. We show that, under the combined complexity measure, metaquerying is generally intractable (unless P=NP), lying sometimes quite high in the complexity hierarchies (as high as NP PP), depending on the characteristics of the plausibility index. However, we are able to single out some tractable and interesting metaquerying cases (whose combined complexity is LOGCFL-complete). As for the data complexity of metaquerying, we prove that, in general, this is in TC0, but lies within AC0 in some simpler cases. Finally, we discuss implementation of metaqueries, by providing algorithms to answer them.
Sebd, 2001
... Fabrizio Angiulli1, Giovambattista Ianni2, and Luigi Palopoli3 1 ISI-CNR c/o Universitá della... more ... Fabrizio Angiulli1, Giovambattista Ianni2, and Luigi Palopoli3 1 ISI-CNR c/o Universitá della Calabria, DEIS, Via P. Bucci 41C, Rende, Italy angiulli@isi.cs.cnr.it 2 Universitá della Calabria, DEIS, Via P. Bucci 41C, Rende, Italy ianni@deis.unical.it 3 Universitá di Reggio Calabria ...
Default logic is used to describe regular behavior and normal properties. We suggest to exploit t... more Default logic is used to describe regular behavior and normal properties. We suggest to exploit the framework of default logic for detecting outliers - individuals who behave in an unexpected way or feature abnormal properties. The ability to locate outliers can help to maintain knowledgebase integrity and to single out irregular individuals.
Theoretical Computer Science, 2004
Inducing association rules is one of the central tasks in data mining applications. Quantitative ... more Inducing association rules is one of the central tasks in data mining applications. Quantitative association rules induced from databases describe rich and hidden relationships to be found within data that can prove useful for various application purposes (e.g., market basket analysis, customer proÿling, and others). Although association rules are quite widely used in practice, a thorough analysis of the related computational complexity is missing. This paper intends to provide a contribution in this setting. To this end, we ÿrst formally deÿne quantitative association rule mining problems, which include boolean association rules as a special case; we then analyze computational complexity of such problems. The general problem as well as some interesting special cases are considered.
Sistemi Evoluti per Basi di Dati, 2008
Proceedings. International Database Engineering and Applications Symposium, 2004. IDEAS '04., 2004
ABSTRACT In This work we present a novel approximate algorithm to calculate the top k closest pai... more ABSTRACT In This work we present a novel approximate algorithm to calculate the top k closest pairs join query of two large and high dimensional data sets. The algorithm has worst case time complexity O(d2nk) and space complexity O(nd) and guarantees a solution within a O(d1+ 12 /) factor of the exact one, where t ∈ {1,2,..., ∞} denotes the Minkowski metrics Lt of interest and d the dimensionality. It makes use of the concept of space filling curve to establish an order between the points of the space and performs at most d + 1 sorts and scans of the two data sets. During a scan, each point from one data set is compared with its closest points, according to the space filling curve order, in the other data set and points whose contribution to the solution has already been analyzed are detected and eliminated. Experimental results on real and synthetic data sets show that our algorithm (i) behaves as an exact algorithm in low dimensional spaces; (ii) it is able to prune the entire (or a considerable fraction of the) data set even for high dimensions if certain separation conditions are satisfied; (iii) in any case it returns a solution within a small error to the exact one.
IFIP – The International Federation for Information Processing, 2008
... Fabrizio Angiulli DEIS, Universit`a della Calabria, Via P. Bucci 41C, 87036 Rende (CS), Italy... more ... Fabrizio Angiulli DEIS, Universit`a della Calabria, Via P. Bucci 41C, 87036 Rende (CS), Italy, e-mail: f.angiulli@deis.unical.it Stefano Basta ICAR-CNR, Via P. Bucci 41C, 87036 Rende (CS), Italy, e-mail: basta@icar.cnr.it Please ...
Artificial Intelligence, 2014
ABSTRACT Designing algorithms capable of efficiently constructing minimal models of CNFs is an im... more ABSTRACT Designing algorithms capable of efficiently constructing minimal models of CNFs is an important task in AI. This paper provides new results along this research line and presents new algorithms for performing minimal model finding and checking over positive propositional CNFs and model minimization over propositional CNFs. An algorithmic schema, called the Generalized Elimination Algorithm (GEA) is presented, that computes a minimal model of any positive CNF. The schema generalizes the Elimination Algorithm (EA) [BP97], which computes a minimal model of positive head-cycle-free (HCF) CNF theories. While the EA always runs in polynomial time in the size of the input HCF CNF, the complexity of the GEA depends on the complexity of the specific eliminating operator invoked therein, which may in general turn out to be exponential. Therefore, a specific eliminating operator is defined by which the GEA computes, in polynomial time, a minimal model for a class of CNF that strictly includes head-elementary-set-free (HEF) CNF theories [GLL06], which form, in their turn, a strict superset of HCF theories. Furthermore, in order to deal with the high complexity associated with recognizing HEF theories, an "incomplete" variant of the GEA (called IGEA) is proposed: the resulting schema, once instantiated with an appropriate elimination operator, always constructs a model of the input CNF, which is guaranteed to be minimal if the input theory is HEF. In the light of the above results, the main contribution of this work is the enlargement of the tractability frontier for the minimal model finding and checking and the model minimization problems.
2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06), 2006
Page 1. A Greedy Search Approach to Co-clustering Sparse Binary Matrices Fabrizio Angiulli, Eugen... more Page 1. A Greedy Search Approach to Co-clustering Sparse Binary Matrices Fabrizio Angiulli, Eugenio Cesario, Clara Pizzuti ICAR-CNR Via P. Bucci, 41C 87036 Rende (CS), Italy {angiulli,cesario,pizzuti}@icar.cnr.it Abstract ...
Lecture Notes in Computer Science, 2004
ABSTRACT A novel algorithm, named DESCRY, for clustering very large multidimensional data sets wi... more ABSTRACT A novel algorithm, named DESCRY, for clustering very large multidimensional data sets with numerical attributes is presented. DESCRY discovers clusters having different shape, size, and density and when data contains noise by first finding and clustering a small set of points, called meta-points, that well depict the shape of clusters present in the data set. Final clusters are obtained by assigning each point to one of the partial clusters. The computational complexity of DESCRY is linear both in the data set size and in the data set dimensionality. Experiments show the very good qualitative results obtained comparable with those obtained by state of the art clustering algorithms.
Theory and Practice of Object Systems, 1997
ABSTRACT This paper illustrates a prototype system, called GPRS, supporting the Generalized Produ... more ABSTRACT This paper illustrates a prototype system, called GPRS, supporting the Generalized Production Rules (GPR) database language. The GPR language integrates, in a unified framework, active rules, which allow the specification of event driven computations on data, and deductive rules, which can be used to derive intensional relations in the style of logic programming. The prototype realizes the operational semantics of GPR using a unique ruleevaluation engine. The data model of reference is object based and the system is implemented on top of an object oriented DBMS. Hence, the GPRS prototype represents a concrete proposal of an advanced DBMS for complex objects that provides both active and deductive styles of rule programming. c fl 1997 John Wiley & Sons
Lecture Notes in Computer Science, 2002
ABSTRACT An approximate algorithm to efficiently solve the k-Closest- Pairs problem in high-dimen... more ABSTRACT An approximate algorithm to efficiently solve the k-Closest- Pairs problem in high-dimensional spaces is presented. The method is based on dimensionality reduction of the space ℝd through the Hilbert space filling curve and performs at most d+1 scans of the data set. After each scan, those points whose contribution to the solution has already been analyzed, are eliminated from the data set. The pruning is lossless, in fact the remaining points along with the approximate solution found can be used for the computation of the exact solution. Although we are able to guarantee an O(d 1+ 1/t ) approximation to the solution, where t = 1,…,∞ denotes the used L t metric, experimental results give the exact k-Closest-Pairs for all the data sets considered and show that the pruning of the search space is effective.
Lecture Notes in Computer Science, 2002
In this paper we propose a new definition of distance-based outlier that considers for each point... more In this paper we propose a new definition of distance-based outlier that considers for each point the sum of the distances from its k nearest neighbors, called weight. Outliers are those points having the largest values of weight. In order to compute these weights, we find the k nearest neighbors of each point in a fast and efficient way by