Francesco Fontanella | University of Cassino and Southern Latium (original) (raw)
Papers by Francesco Fontanella
Evolutionary Computation (EC) has been inspired by the natural phenomena of evolution. It provide... more Evolutionary Computation (EC) has been inspired by the natural phenomena of evolution. It provides a quite general heuristic, exploiting a few basic concepts: reproduction of individuals, variation phenomena that affect the likelihood of survival of individuals and inheritance of parents features by offspring. EC has been widely used in recent years to effectively solve hard, non linear and very complex problems.
Amongst other things, EC–based algorithms have also been used to tackle classification problems. Classification is a process according to which an object is attributed to one of a finite set of classes or, in other words, it is recognized as belonging to a set of equal or similar entities, identified by a label. The main aspect of classification usually concerns the generation of prototypes to be used to recognize unknown patterns. The role of prototypes is that of representing patterns belonging to the different classes defined within a given problem. For most of the problems of practical interest, the generation of such prototypes is very difficult, since a prototype must be able to represent patterns belonging to the same class, which may be significantly dissimilar to each other. They must also be able to discriminate between patterns belonging to classes different from the one that they represent. Moreover, a prototype should contain the minimum amount of information required to satisfy the requirements mentioned above.
The research presented in this thesis has led to the definition of an EC–based framework to be used for prototype generation. The defined framework does not provide for the use of any particular kind of prototypes. In fact, it can generate any kind of prototype once an encoding scheme for the used prototypes has been defined. The generality of the framework can be exploited to develop many applications. The framework has been employed to implement two specific applications for prototype generation. The developed applications have been tested on several data sets and the results compared with those obtained by other approaches previously presented in the literature.
The large majority of methods proposed in literature for handwriting recognition assume that word... more The large majority of methods proposed in literature for handwriting recognition assume that words are produced drawing large parts of the ink without lifting the pen, other than horizontal bars and dots. This fundamental assumption, however, does not always hold: while some educational systems provide explicit training for producing continuous handwriting, minimizing the number of pen-up during the production of a word, others do not. As a consequence, whenever the handwriting presents pen-up within a word, the recognition performance can drop significantly. In a preliminary study, we presented an algorithm for discriminating among different types of ink appearing in handwriting, namely isolated characters, cursive, dots, horizontal and vertical bars, based on the use of a suitable set of features. In this paper, we have characterized the discriminative power of each considered feature according to different measures and we have proposed a method for combining the different feature...
Graphs are widely used to represent complex and structured information of interest in various fie... more Graphs are widely used to represent complex and structured information of interest in various fields of science and engi-
neering. When using graph representations, problems of special interest often imply searching. For example, searching
for the prototypes representing a dataset of graphs or for the graph that optimizes a set of parameters. In any case,
it is necessary that the problem solution be expressed in terms of graphs. Therefore, defining effective methods for
automatically generating single graphs, or sets of graphs, representing problem solutions, is a key issue. A new evolu-
tionary computation–based approach specifically devised for generating graphs is presented. The method is based on a
special data structure, called multilist, which allows the encoding of any type of graph, directed or undirected, with or
without attributes. Graph encoding by multilists makes it possible to define effective crossover and mutation operators,
overcoming the problems normally encountered when implementing genetic operators on graphs. Further advantages of
the proposed approach are that it does not require any problem specific knowledge and it is able to search for graphs
whose number of nodes is not known a priori. Three sets of experiments were performed to test the proposed approach
and the solutions found were compared with those obtained by other approaches proposed in the literature.
The large majority of methods proposed in literature for handwriting recognition assume that word... more The large majority of methods proposed in literature for handwriting recognition assume that words are produced drawing large parts of the ink without lifting the pen, other than horizontal bars and dots. This fundamental assumption, however, does not always hold: while some educational systems provide explicit training for producing continuous handwriting, minimizing the number of pen-up during the production of a word, others do not. As a consequence, whenever the handwriting presents pen-up within a word, the recognition performance can drop significantly. In a preliminary study, we presented an algorithm for discriminating among different types of ink appearing in handwriting, namely isolated characters, cursive, dots, horizontal and vertical bars, based on the use of a suitable set of features. In this paper, we have characterized the discriminative power of each considered feature according to different measures and we have proposed a method for combining the different feature...
Lecture Notes in Computer Science, 2011
Recently, ensemble techniques have also attracted the attention of Genetic Programing (GP) resear... more Recently, ensemble techniques have also attracted the attention of Genetic Programing (GP) researchers. The goal is to further improve GP classification performances. Among the ensemble techniques, also bagging and boosting have been taken into account. These techniques improve classification accuracy by combining the responses of different classifiers by using a majority vote rule. However, it is really hard to ensure that classifiers in the ensemble be appropriately diverse, so as to avoid correlated errors. Our approach tries to cope with this problem, designing a framework for effectively combine GP-based ensemble by means of a Bayesian Network. The proposed system uses two different approaches. The first one applies a boosting technique to a GP-based classification algorithm in order to generate an effective decision trees ensemble. The second module uses a Bayesian network for combining the responses provided by such ensemble and select the most appropriate decision trees. The Bayesian network is learned by means of a specifically devised Evolutionary algorithm. Preliminary experimental results confirmed the effectiveness of the proposed approach.
Lecture Notes in Computer Science, 2012
Classifier ensemble techniques are effectively used to combine the responses provided by a set of... more Classifier ensemble techniques are effectively used to combine the responses provided by a set of classifiers. Classifier ensembles improve the performance of single classifier systems, even if a large number of classifiers is often required. This implies large memory requirements and slow speeds of classification, making their use critical in some applications. This problem can be reduced by selecting a fraction of the classifiers from the original ensemble. In this work, it is presented an ensemble-based framework that copes with large datasets, however selecting a small number of classifiers composing the ensemble. The framework is based on two modules: an ensemble-based Genetic Programming (GP) system, which produces a high performing ensemble of decision tree classifiers, and a Bayesian Network (BN) approach to perform classifier selection. The proposed system exploits the advantages provided by both techniques and allows to strongly reduce the number of classifiers in the ensemble. Experimental results compare the system with well-known techniques both in the field of GP and BN and show the effectiveness of the devised approach. In addition, a comparison with a pareto optimal strategy of pruning has been performed. Table 3. Comparison results for the selection strategies. Bold values represent the best statistically significant results, while starred values represent the second best results. Dataset ens. BN-Boost Pareto optimal error #sel. geno pheno error #sel. error #sel. Adult 10 15,05 3,05 17,25 4,95 17, 24 5,20 20 13,53 3,90 17,15 9,75 16, 99 13,05 50 13,53 3,90 17,15 9,75 16, 99 13,05
2014 22nd International Conference on Pattern Recognition, 2014
2012 International Conference on Frontiers in Handwriting Recognition, 2012
Classifier combination methods have proved to be an effective tool for increasing the performance... more Classifier combination methods have proved to be an effective tool for increasing the performance in pattern recognition applications. The rationale of this approach follows from the observation that appropriately diverse classifiers make uncorrelated errors. Unfortunately, this theoretical assumption is not easy to satisfy in practical cases, thus reducing the performance obtainable with any combination strategy. In this paper we propose a new weighted majority vote rule which try to solve this problem by jointly analyzing the responses provided by all the experts, in order to capture their collective behavior when classifying a sample. Our rule associates a weight to each class rather than to each expert and computes such weights by estimating the joint probability distribution of each class with the set of responses provided by all the experts in the combining pool. The probability distribution has been computed by using the naive Bayes probabilistic model. Despite its simplicity, this model has been successfully used in many practical applications, often competing with much more sophisticated techniques. The experimental results, performed by using three standard databases of handwritten digits, confirmed the effectiveness of the proposed method.
Lecture Notes in Computer Science, 2006
Abstract. An evolutionary computation based algorithm for data clas-sification is presented. The ... more Abstract. An evolutionary computation based algorithm for data clas-sification is presented. The proposed algorithm refers to the learning vector quantization paradigm and is able to evolve sets of points in the feature space in order to find the class prototypes. The more ...
Lecture Notes in Computer Science, 2006
In this paper we propose a new genetic programming based approach for prototype generation in Pat... more In this paper we propose a new genetic programming based approach for prototype generation in Pattern Recognition problems. Prototypes consist of mathematical expressions and are encoded as derivation trees. The devised system is able to cope with classification problems in which the number of prototypes is not a priori known. The approach has been tested on several problems and the results compared with those obtained by other genetic programming based approaches previously proposed.
Lecture Notes in Computer Science, 2005
A new genetic programming based approach to classification problems is proposed. Differently from... more A new genetic programming based approach to classification problems is proposed. Differently from other approaches, the number of prototypes in the classifier is not a priori fixed, but automatically found by the system. In fact, in many problems a single class may contain a variable number of subclasses. Hence, a single prototype, may be inadequate to represent all the members of the class. The devised approach has been tested on several problems and the results compared with those obtained by a different genetic programming based approach recently proposed in the literature.
2008 19th International Conference on Pattern Recognition, 2008
We present a Genetic Algorithm based feature selection approach according to which feature subset... more We present a Genetic Algorithm based feature selection approach according to which feature subsets are represented by individuals of an evolving population. Evolution is controlled by a fitness function taking into account statistical properties of the input data in the subspace represented by each individual, and aims to select the smallest feature subset that optimizes class separability. The originality of our method lies particularly in the definition of the evaluation function.
Proceedings of the ACM Symposium on Applied Computing, 2005
Most of the classical methods for clustering analysis require the user setting of number of clust... more Most of the classical methods for clustering analysis require the user setting of number of clusters. To surmount this problem, in this paper a grammar-based Genetic Programming approach to automatic data clustering is presented. An innovative clustering process is conceived strictly linked to a novel cluster representation which provides intelligible information on patterns. The efficacy of the implemented partitioning system is estimated on a medical domain by exploiting expressly defined evaluation indices. Furthermore, a comparison with other clustering tools is performed.
Advances in Soft Computing, 2006
Most of the classical clustering algorithms are strongly dependent on, and sensitive to, paramete... more Most of the classical clustering algorithms are strongly dependent on, and sensitive to, parameters such as number of expected clusters and resolution level. To overcome this drawback, in this paper a Genetic Programming framework, capable of performing an automatic data clustering, is presented. Moreover, a novel way of representing clusters which provides intelligible information on patterns is introduced together with an innovative clustering process. The effectiveness of the implemented partitioning system is estimated on a medical domain by means of evaluation indices.
The first phase of the image analysis is based on two steps: the former is the application of a s... more The first phase of the image analysis is based on two steps: the former is the application of a segmentation algorithm based on a Markov Random Field (MRF) model, while the latter consists in the extraction of a feature set related to the objects identified in the segmentation process. The MRF approach let include an a-priori knowledge on the segmentation
Pattern Recognition Letters, 2014
In the framework of handwriting recognition, we present a novel GA-based feature selection algori... more In the framework of handwriting recognition, we present a novel GA-based feature selection algorithm in which feature subsets are evaluated by means of a specifically devised separability index. This index measures statistical properties of the feature subset and does not depends on any specific classification scheme. The proposed index represents an extension of the Fisher Linear Discriminant method and uses covariance matrices for estimating how class probability distributions are spread out in the considered N −dimensional feature space. A key property of our approach is that it does not require any a priori knowledge about the number of features to be used in the feature subset. Experiments have been performed by using three standard databases of handwritten digits and a standard database of handwritten letters, while the solutions found have been tested with different classification methods. The results have been compared with those obtained by using the whole feature set and with those obtained by using standard feature selection algorithms. The comparison outcomes confirmed the effectiveness of our approach.
Evolutionary Computation (EC) has been inspired by the natural phenomena of evolution. It provide... more Evolutionary Computation (EC) has been inspired by the natural phenomena of evolution. It provides a quite general heuristic, exploiting a few basic concepts: reproduction of individuals, variation phenomena that affect the likelihood of survival of individuals and inheritance of parents features by offspring. EC has been widely used in recent years to effectively solve hard, non linear and very complex problems.
Amongst other things, EC–based algorithms have also been used to tackle classification problems. Classification is a process according to which an object is attributed to one of a finite set of classes or, in other words, it is recognized as belonging to a set of equal or similar entities, identified by a label. The main aspect of classification usually concerns the generation of prototypes to be used to recognize unknown patterns. The role of prototypes is that of representing patterns belonging to the different classes defined within a given problem. For most of the problems of practical interest, the generation of such prototypes is very difficult, since a prototype must be able to represent patterns belonging to the same class, which may be significantly dissimilar to each other. They must also be able to discriminate between patterns belonging to classes different from the one that they represent. Moreover, a prototype should contain the minimum amount of information required to satisfy the requirements mentioned above.
The research presented in this thesis has led to the definition of an EC–based framework to be used for prototype generation. The defined framework does not provide for the use of any particular kind of prototypes. In fact, it can generate any kind of prototype once an encoding scheme for the used prototypes has been defined. The generality of the framework can be exploited to develop many applications. The framework has been employed to implement two specific applications for prototype generation. The developed applications have been tested on several data sets and the results compared with those obtained by other approaches previously presented in the literature.
The large majority of methods proposed in literature for handwriting recognition assume that word... more The large majority of methods proposed in literature for handwriting recognition assume that words are produced drawing large parts of the ink without lifting the pen, other than horizontal bars and dots. This fundamental assumption, however, does not always hold: while some educational systems provide explicit training for producing continuous handwriting, minimizing the number of pen-up during the production of a word, others do not. As a consequence, whenever the handwriting presents pen-up within a word, the recognition performance can drop significantly. In a preliminary study, we presented an algorithm for discriminating among different types of ink appearing in handwriting, namely isolated characters, cursive, dots, horizontal and vertical bars, based on the use of a suitable set of features. In this paper, we have characterized the discriminative power of each considered feature according to different measures and we have proposed a method for combining the different feature...
Graphs are widely used to represent complex and structured information of interest in various fie... more Graphs are widely used to represent complex and structured information of interest in various fields of science and engi-
neering. When using graph representations, problems of special interest often imply searching. For example, searching
for the prototypes representing a dataset of graphs or for the graph that optimizes a set of parameters. In any case,
it is necessary that the problem solution be expressed in terms of graphs. Therefore, defining effective methods for
automatically generating single graphs, or sets of graphs, representing problem solutions, is a key issue. A new evolu-
tionary computation–based approach specifically devised for generating graphs is presented. The method is based on a
special data structure, called multilist, which allows the encoding of any type of graph, directed or undirected, with or
without attributes. Graph encoding by multilists makes it possible to define effective crossover and mutation operators,
overcoming the problems normally encountered when implementing genetic operators on graphs. Further advantages of
the proposed approach are that it does not require any problem specific knowledge and it is able to search for graphs
whose number of nodes is not known a priori. Three sets of experiments were performed to test the proposed approach
and the solutions found were compared with those obtained by other approaches proposed in the literature.
The large majority of methods proposed in literature for handwriting recognition assume that word... more The large majority of methods proposed in literature for handwriting recognition assume that words are produced drawing large parts of the ink without lifting the pen, other than horizontal bars and dots. This fundamental assumption, however, does not always hold: while some educational systems provide explicit training for producing continuous handwriting, minimizing the number of pen-up during the production of a word, others do not. As a consequence, whenever the handwriting presents pen-up within a word, the recognition performance can drop significantly. In a preliminary study, we presented an algorithm for discriminating among different types of ink appearing in handwriting, namely isolated characters, cursive, dots, horizontal and vertical bars, based on the use of a suitable set of features. In this paper, we have characterized the discriminative power of each considered feature according to different measures and we have proposed a method for combining the different feature...
Lecture Notes in Computer Science, 2011
Recently, ensemble techniques have also attracted the attention of Genetic Programing (GP) resear... more Recently, ensemble techniques have also attracted the attention of Genetic Programing (GP) researchers. The goal is to further improve GP classification performances. Among the ensemble techniques, also bagging and boosting have been taken into account. These techniques improve classification accuracy by combining the responses of different classifiers by using a majority vote rule. However, it is really hard to ensure that classifiers in the ensemble be appropriately diverse, so as to avoid correlated errors. Our approach tries to cope with this problem, designing a framework for effectively combine GP-based ensemble by means of a Bayesian Network. The proposed system uses two different approaches. The first one applies a boosting technique to a GP-based classification algorithm in order to generate an effective decision trees ensemble. The second module uses a Bayesian network for combining the responses provided by such ensemble and select the most appropriate decision trees. The Bayesian network is learned by means of a specifically devised Evolutionary algorithm. Preliminary experimental results confirmed the effectiveness of the proposed approach.
Lecture Notes in Computer Science, 2012
Classifier ensemble techniques are effectively used to combine the responses provided by a set of... more Classifier ensemble techniques are effectively used to combine the responses provided by a set of classifiers. Classifier ensembles improve the performance of single classifier systems, even if a large number of classifiers is often required. This implies large memory requirements and slow speeds of classification, making their use critical in some applications. This problem can be reduced by selecting a fraction of the classifiers from the original ensemble. In this work, it is presented an ensemble-based framework that copes with large datasets, however selecting a small number of classifiers composing the ensemble. The framework is based on two modules: an ensemble-based Genetic Programming (GP) system, which produces a high performing ensemble of decision tree classifiers, and a Bayesian Network (BN) approach to perform classifier selection. The proposed system exploits the advantages provided by both techniques and allows to strongly reduce the number of classifiers in the ensemble. Experimental results compare the system with well-known techniques both in the field of GP and BN and show the effectiveness of the devised approach. In addition, a comparison with a pareto optimal strategy of pruning has been performed. Table 3. Comparison results for the selection strategies. Bold values represent the best statistically significant results, while starred values represent the second best results. Dataset ens. BN-Boost Pareto optimal error #sel. geno pheno error #sel. error #sel. Adult 10 15,05 3,05 17,25 4,95 17, 24 5,20 20 13,53 3,90 17,15 9,75 16, 99 13,05 50 13,53 3,90 17,15 9,75 16, 99 13,05
2014 22nd International Conference on Pattern Recognition, 2014
2012 International Conference on Frontiers in Handwriting Recognition, 2012
Classifier combination methods have proved to be an effective tool for increasing the performance... more Classifier combination methods have proved to be an effective tool for increasing the performance in pattern recognition applications. The rationale of this approach follows from the observation that appropriately diverse classifiers make uncorrelated errors. Unfortunately, this theoretical assumption is not easy to satisfy in practical cases, thus reducing the performance obtainable with any combination strategy. In this paper we propose a new weighted majority vote rule which try to solve this problem by jointly analyzing the responses provided by all the experts, in order to capture their collective behavior when classifying a sample. Our rule associates a weight to each class rather than to each expert and computes such weights by estimating the joint probability distribution of each class with the set of responses provided by all the experts in the combining pool. The probability distribution has been computed by using the naive Bayes probabilistic model. Despite its simplicity, this model has been successfully used in many practical applications, often competing with much more sophisticated techniques. The experimental results, performed by using three standard databases of handwritten digits, confirmed the effectiveness of the proposed method.
Lecture Notes in Computer Science, 2006
Abstract. An evolutionary computation based algorithm for data clas-sification is presented. The ... more Abstract. An evolutionary computation based algorithm for data clas-sification is presented. The proposed algorithm refers to the learning vector quantization paradigm and is able to evolve sets of points in the feature space in order to find the class prototypes. The more ...
Lecture Notes in Computer Science, 2006
In this paper we propose a new genetic programming based approach for prototype generation in Pat... more In this paper we propose a new genetic programming based approach for prototype generation in Pattern Recognition problems. Prototypes consist of mathematical expressions and are encoded as derivation trees. The devised system is able to cope with classification problems in which the number of prototypes is not a priori known. The approach has been tested on several problems and the results compared with those obtained by other genetic programming based approaches previously proposed.
Lecture Notes in Computer Science, 2005
A new genetic programming based approach to classification problems is proposed. Differently from... more A new genetic programming based approach to classification problems is proposed. Differently from other approaches, the number of prototypes in the classifier is not a priori fixed, but automatically found by the system. In fact, in many problems a single class may contain a variable number of subclasses. Hence, a single prototype, may be inadequate to represent all the members of the class. The devised approach has been tested on several problems and the results compared with those obtained by a different genetic programming based approach recently proposed in the literature.
2008 19th International Conference on Pattern Recognition, 2008
We present a Genetic Algorithm based feature selection approach according to which feature subset... more We present a Genetic Algorithm based feature selection approach according to which feature subsets are represented by individuals of an evolving population. Evolution is controlled by a fitness function taking into account statistical properties of the input data in the subspace represented by each individual, and aims to select the smallest feature subset that optimizes class separability. The originality of our method lies particularly in the definition of the evaluation function.
Proceedings of the ACM Symposium on Applied Computing, 2005
Most of the classical methods for clustering analysis require the user setting of number of clust... more Most of the classical methods for clustering analysis require the user setting of number of clusters. To surmount this problem, in this paper a grammar-based Genetic Programming approach to automatic data clustering is presented. An innovative clustering process is conceived strictly linked to a novel cluster representation which provides intelligible information on patterns. The efficacy of the implemented partitioning system is estimated on a medical domain by exploiting expressly defined evaluation indices. Furthermore, a comparison with other clustering tools is performed.
Advances in Soft Computing, 2006
Most of the classical clustering algorithms are strongly dependent on, and sensitive to, paramete... more Most of the classical clustering algorithms are strongly dependent on, and sensitive to, parameters such as number of expected clusters and resolution level. To overcome this drawback, in this paper a Genetic Programming framework, capable of performing an automatic data clustering, is presented. Moreover, a novel way of representing clusters which provides intelligible information on patterns is introduced together with an innovative clustering process. The effectiveness of the implemented partitioning system is estimated on a medical domain by means of evaluation indices.
The first phase of the image analysis is based on two steps: the former is the application of a s... more The first phase of the image analysis is based on two steps: the former is the application of a segmentation algorithm based on a Markov Random Field (MRF) model, while the latter consists in the extraction of a feature set related to the objects identified in the segmentation process. The MRF approach let include an a-priori knowledge on the segmentation
Pattern Recognition Letters, 2014
In the framework of handwriting recognition, we present a novel GA-based feature selection algori... more In the framework of handwriting recognition, we present a novel GA-based feature selection algorithm in which feature subsets are evaluated by means of a specifically devised separability index. This index measures statistical properties of the feature subset and does not depends on any specific classification scheme. The proposed index represents an extension of the Fisher Linear Discriminant method and uses covariance matrices for estimating how class probability distributions are spread out in the considered N −dimensional feature space. A key property of our approach is that it does not require any a priori knowledge about the number of features to be used in the feature subset. Experiments have been performed by using three standard databases of handwritten digits and a standard database of handwritten letters, while the solutions found have been tested with different classification methods. The results have been compared with those obtained by using the whole feature set and with those obtained by using standard feature selection algorithms. The comparison outcomes confirmed the effectiveness of our approach.
MetroArchaeo is an international conference counting already three editions (Benevento 2015, Tori... more MetroArchaeo is an international conference counting already three editions (Benevento 2015, Torino 2016, Lecce 2017). It brings together researchers and operators in the enhancement, characterization and preservation of archaeological and cultural heritage with the main objective of discussing production, interpretation and reliability of measurements and data.
The conference is conceived to foster exchanges of ideas and information, create collaborative networks and update innovations on “measurements” suitable for cultural heritage for archaeologists, conservators and scientists.
Summarizing, METROARCHAEO2018 is designed to profit of a multidisciplinary approach to give to the Cultural heritage community, from archaeologists to historians, conservators, engineers, material scientists, etc… a complete picture of the measurements utilizations and data treatments with the ultimate goal of increasing knowledge on the characterization and safeguard of archaeological and historic heritage, generally addressed in sectorial conferences.
The fourth Conference will be held in Cassino, October 22-24 2018.