Mihaela Breaban | Universitatea Alexandru Ioan Cuza Iasi (original) (raw)
Papers by Mihaela Breaban
2006 Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2006
There are two major ways to solve constraint satisfaction problems (CSPs) : inference approaches ... more There are two major ways to solve constraint satisfaction problems (CSPs) : inference approaches and search algo-rithms [4]. Inference methods derive and record new infor-mation in order to make the problem easier to solve. Search algorithms seek for a solution in the space of ...
Lecture Notes in Computer Science, 2006
There are two major ways to solve constraint satisfaction problems(CSPs) : inference approaches a... more There are two major ways to solve constraint satisfaction problems(CSPs) : inference approaches and search algorithms [1]. Inference approaches derive and record new information in order to make the problem easier to solve. Search algorithms seek for a solution in the space of ...
The chapter presents some of the techniques based on Evolutionary Computation paradigms for solvi... more The chapter presents some of the techniques based on Evolutionary Computation paradigms for solving constraints satisfaction problems. Two hybrid approaches based on the idea of using the heuristics extracted from an inference algorithm inside evolutionary computation paradigms are detailed. The effect of combining inference with randomized search was studied by exploiting the advantage of adaptable inference levels offered by the Mini-Bucket Elimination algorithm. Tests conducted on binary CSPs against a Branch and Bound algorithm show that the systematic search has more benefit from inference than the randomized search performed by evolutionary computation paradigms. However, on hard CSP instances the Branch and Bound algorithm requires higher levels of inference which imply a much greater computational cost in order to compete with evolutionary computation methods.
Lecture Notes in Computer Science, 2015
Derived from the well-known Traveling Salesman problem (TSP), the multiple-Traveling Salesman pro... more Derived from the well-known Traveling Salesman problem (TSP), the multiple-Traveling Salesman problem (multiple-TSP) with single depot is a straightforward generalization: several salesmen located in a given city (the depot) need to visit a set of interconnected cities, such that each city is visited exactly once (by a single salesman) while the total cost of their tours is minimized. Designed for shortest path problems and with proven efficiency for TSP, Ant Colony Systems (ACS) are a natural choice for multiple-TSP as well. Although several variations of ant algorithms for multiple-TSP are reported in the literature, there is no clear evidence on their comparative performance. The contribution of this paper is twofold: it provides a benchmark for single-depot-multiple-TSP with reported optima and performs a thorough experimental evaluation of several variations of the ACS on this problem.
2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2008
AbstractThis paper presents a method for enhancing the performance of current clustering algorit... more AbstractThis paper presents a method for enhancing the performance of current clustering algorithms; the method is based on Particle Swarm Optimization techniques. Namely, a pre-processing step aims at bringing closer objects which are likely to belong to the same cluster, while ...
Annals of West University of Timisoara - Mathematics, 2013
The current paper presents a method to deliver nonlinear projections of a data set that discrimin... more The current paper presents a method to deliver nonlinear projections of a data set that discriminate between existing labeled groups of data items. Inspired from traditional linear Projection Pursuit and Linear Discriminant Analysis, the new method seeks nonlinear combinations of attributes as polynomials that maximize Fisher's criterion. The search for the monomials in a polynomial is conducted in a logarithmic space in order to reduce computational complexity. The selection of monomials and the optimization of weights that conduct to the nonlinear projection are performed with a multi-modal Genetic Algorithm hybridized with Differential Evolution. By alleviating the drawbacks driven from the linearity assumptions in traditional Projection Pursuit, the new method could gain a wide applicability in both unsupervised and supervised data analysis.
2006 Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2006
There are two major ways to solve constraint satisfaction problems (CSPs) : inference approaches ... more There are two major ways to solve constraint satisfaction problems (CSPs) : inference approaches and search algo-rithms [4]. Inference methods derive and record new infor-mation in order to make the problem easier to solve. Search algorithms seek for a solution in the space of ...
Proceedings of the 13th annual conference on Genetic and evolutionary computation - GECCO '11, 2011
Clustering is a fundamental and hence widely studied problem in data analysis. In a multi-objecti... more Clustering is a fundamental and hence widely studied problem in data analysis. In a multi-objective perspective, this paper combines principles from two different clustering paradigms: the connectivity principle from density-based methods is integrated into the partitional clustering approach. The standard k-Means algorithm is hybridized with Particle Swarm Optimization. The new method (PSO-kMeans) benefits from both a local and a global
Lecture Notes in Computer Science, 2010
Abstract. Clustering analysis is an important step towards getting in-sight into new data. Ensemb... more Abstract. Clustering analysis is an important step towards getting in-sight into new data. Ensemble procedures have been designed in order to obtain improved partitions of a data set. Previous work in domain, mostly empirical, shows that accuracy and a limited diversity are manda- ...
Proceedings of the 11th Annual conference on Genetic and evolutionary computation - GECCO '09, 2009
... Crowding Genetic Algorithms Mihaela Breaban Faculty of Computer Science Alexandru Ioan Cuza U... more ... Crowding Genetic Algorithms Mihaela Breaban Faculty of Computer Science Alexandru Ioan Cuza University Iasi, Romania pmihaela@infoiasi.ro Henri Luchian Faculty of Computer Science Alexandru Ioan Cuza University Iasi, Romania hluchian@infoiasi.ro ...
2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2012
ABSTRACT This paper proposes a new method to identify interesting structures in data based on the... more ABSTRACT This paper proposes a new method to identify interesting structures in data based on the projection pursuit methodology. Past work reported in literature uses projection pursuit methods as means to visualize high-dimensional data, or to identify linear combinations of attributes that reveal grouping tendencies or outliers. The framework of projection pursuit is generally formulated as an optimization problem aiming at finding projection axes that minimize/maximize a projection index. With regard to identifying interesting structure, the existing approaches suffer from obvious limitations: linear models are not able to catch more general structures in data like circular/curved clusters or any structure that is the result of a polynomial/nonlinear generative model. This paper extends linear projection pursuit to nonlinear projections while allowing at the same time for the preservation of the general methodology employed in the search of projections. In addition, an algorithmic framework based on multi-modal genetic algorithms is proposed in order to deal with the large search space and to allow for the use of non-differentiable projection indices. Experiments conducted on synthetic data demonstrate the ability of the new approach to identify clusters of various shapes that otherwise are undetectable with linear projection pursuit or popular clustering methods like k-Means.
2009 IEEE Congress on Evolutionary Computation, 2009
Abstract This paper is concerned with a problem in information organization and retrieval within ... more Abstract This paper is concerned with a problem in information organization and retrieval within Web communities. Most work in this domain is focused on reputation-based systems which exploit the experience gathered by previous users in order to evaluate resources at the community level. The current research focuses on a slightly different approach: a personalized evaluation system whose goal is to build a flexible and easy way to manage resources in a personalized manner. The functionality of such a model comes from local ...
Lecture Notes in Computer Science, 2006
There are two major ways to solve constraint satisfaction problems(CSPs) : inference approaches a... more There are two major ways to solve constraint satisfaction problems(CSPs) : inference approaches and search algorithms [1]. Inference approaches derive and record new information in order to make the problem easier to solve. Search algorithms seek for a solution in the space of ...
New Achievements in Evolutionary Computation, 2010
Source: New Achievements in Evolutionary Computation, Book edited by: Peter Korosec, ISBN 978-953... more Source: New Achievements in Evolutionary Computation, Book edited by: Peter Korosec, ISBN 978-953-307-053-7, pp. 318, February 2010, INTECH, Croatia, downloaded from SCIYO.COM
Pattern Recognition, 2011
Exploratory data analysis methods are essential for getting insight into data. Identifying the mo... more Exploratory data analysis methods are essential for getting insight into data. Identifying the most important variables and detecting quasi-homogenous groups of data are problems of interest in this context. Solving such problems is a difficult task, mainly due to the unsupervised nature of the underlying learning process. Unsupervised feature selection and unsupervised clustering can be successfully approached as optimization problems
Journal of Petroleum Science and Engineering, 2013
ABSTRACT In petroleum industry, the compressional acoustic or sonic log (DT) is commonly used as ... more ABSTRACT In petroleum industry, the compressional acoustic or sonic log (DT) is commonly used as a predictor because its capabilities respond to changes in porosity or compaction which, in turn, are further used to estimate formation (sonic) porosity, to map abnormal pore-fluid pressure, or to carry out petrophysical studies. Despite its intrinsic capabilities, the sonic log is not routinely recorded in during well logging. We propose using a method belonging to the class of supervised machine learning algorithms — Support Vector Regression (SVR) — to synthesize missing compressional acoustic or sonic (DT) logs when only common logs (e.g., natural gamma ray—GR, or deep resistivity—REID) are available.Our approach involves three steps: (1) supervised training of the model; (2) confirmation and validation of the model by blind-testing the results in wells containing both the predictor (GR, REID) and the target (DT) values used in the supervised training; and (3) application of the predicted model to wells containing the predictor data and obtaining the synthetic (simulated) DT log. SVR methodology offers two advantages over traditional deterministic methods: strong nonlinear approximation capabilities and good generalization effectiveness. These result from the use of kernel functions and from the structural risk minimization principle behind SVR. Unlike linear regression techniques, SVR does not overpredict mean values and thereby preserves original data variability. SVR also deals greatly with uncertainty associated with the data, the immense size of the data and the diversity of the data type. A case study from the Anadarko Basin, Oklahoma, about estimating the presence of abnormally pressurized pore-fluid zones by using synthesized DT values, is presented. The results are promising and encouraging.
The single-depot multiple TSP (SD-MTSP) is a simple extension of the standard TSP, in which more ... more The single-depot multiple TSP (SD-MTSP) is a simple extension of the standard TSP, in which more than one
salesman is allowed to visit the set of interconnected cities, such that each city is visited exactly once (by a single salesman) and the total cost of the traveled subtours is minimized. Although Ant Colony Systems (ACSs) are a natural choice for shortest-path problems, with TSP at its core, the application of ACS on this straightforward extension is not properly explored. The reasons may lie in the bi-criteria nature of the problem (shortest cost versus balanced subtours) and the lack of dedicated benchmarks exposing optimal solutions. This paper attempts at proposing and evaluating from a bi-criteria perspective several multiobjective ACSs to tackle SD-MTSP when two objectives need to be simultaneously optimized: minimizing the total cost of traveled subtours while achieving balanced subtours. Experiments are conducted towards investigating the efficiency of the algorithms in a multi-objective setting.
Clustering is a fundamental and hence widely studied problem in data analysis. In a multi-objecti... more Clustering is a fundamental and hence widely studied problem in data analysis. In a multi-objective perspective, this paper combines principles from two different clustering paradigms: the connectivity principle from density-based methods is integrated into the partitional clustering approach. The standard k-Means algorithm is hybridized with Particle Swarm Optimization. The new method (PSO-kMeans) benefits from both a local and a global view on data and alleviates some drawbacks of the k-Means algorithm; thus, it is able to spot types of clusters which are otherwise difficult to obtain (elongated shapes, non-similar volumes). Our experimental results show that PSO-kMeans improves the performance of standard k-Means in all test cases and performs at least comparable to state-of-the-art methods in the worst case. PSO-kMeans is robust to outliers. This comes at a cost: the preprocessing step for finding the nearest neighbors for each data item is required, which increases the initial linear complexity of k-Means to quadratic complexity.
The work in unsupervised learning centered on clustering has been extended with new paradigms to ... more The work in unsupervised learning centered on clustering has been extended with new paradigms to address the demands raised by real-world problems. In this regard, unsupervised feature selection has been proposed to remove noisy attributes that could mislead the clustering procedures. Additionally, semi-supervision has been integrated within existing paradigms because some background information usually exist in form of a reduced number of similarity/dissimilarity constraints. In this context, the current paper investigates a method to perform simultaneously feature selection and clustering. The benefits of a semi-supervised approach making use of reduced external information are highlighted against an unsupervised approach. The method makes use of an ensemble of near-optimal feature subsets delivered by a multi-modal genetic algorithm in order to quantify the relative importance of each feature to clustering.
2006 Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2006
There are two major ways to solve constraint satisfaction problems (CSPs) : inference approaches ... more There are two major ways to solve constraint satisfaction problems (CSPs) : inference approaches and search algo-rithms [4]. Inference methods derive and record new infor-mation in order to make the problem easier to solve. Search algorithms seek for a solution in the space of ...
Lecture Notes in Computer Science, 2006
There are two major ways to solve constraint satisfaction problems(CSPs) : inference approaches a... more There are two major ways to solve constraint satisfaction problems(CSPs) : inference approaches and search algorithms [1]. Inference approaches derive and record new information in order to make the problem easier to solve. Search algorithms seek for a solution in the space of ...
The chapter presents some of the techniques based on Evolutionary Computation paradigms for solvi... more The chapter presents some of the techniques based on Evolutionary Computation paradigms for solving constraints satisfaction problems. Two hybrid approaches based on the idea of using the heuristics extracted from an inference algorithm inside evolutionary computation paradigms are detailed. The effect of combining inference with randomized search was studied by exploiting the advantage of adaptable inference levels offered by the Mini-Bucket Elimination algorithm. Tests conducted on binary CSPs against a Branch and Bound algorithm show that the systematic search has more benefit from inference than the randomized search performed by evolutionary computation paradigms. However, on hard CSP instances the Branch and Bound algorithm requires higher levels of inference which imply a much greater computational cost in order to compete with evolutionary computation methods.
Lecture Notes in Computer Science, 2015
Derived from the well-known Traveling Salesman problem (TSP), the multiple-Traveling Salesman pro... more Derived from the well-known Traveling Salesman problem (TSP), the multiple-Traveling Salesman problem (multiple-TSP) with single depot is a straightforward generalization: several salesmen located in a given city (the depot) need to visit a set of interconnected cities, such that each city is visited exactly once (by a single salesman) while the total cost of their tours is minimized. Designed for shortest path problems and with proven efficiency for TSP, Ant Colony Systems (ACS) are a natural choice for multiple-TSP as well. Although several variations of ant algorithms for multiple-TSP are reported in the literature, there is no clear evidence on their comparative performance. The contribution of this paper is twofold: it provides a benchmark for single-depot-multiple-TSP with reported optima and performs a thorough experimental evaluation of several variations of the ACS on this problem.
2008 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2008
AbstractThis paper presents a method for enhancing the performance of current clustering algorit... more AbstractThis paper presents a method for enhancing the performance of current clustering algorithms; the method is based on Particle Swarm Optimization techniques. Namely, a pre-processing step aims at bringing closer objects which are likely to belong to the same cluster, while ...
Annals of West University of Timisoara - Mathematics, 2013
The current paper presents a method to deliver nonlinear projections of a data set that discrimin... more The current paper presents a method to deliver nonlinear projections of a data set that discriminate between existing labeled groups of data items. Inspired from traditional linear Projection Pursuit and Linear Discriminant Analysis, the new method seeks nonlinear combinations of attributes as polynomials that maximize Fisher's criterion. The search for the monomials in a polynomial is conducted in a logarithmic space in order to reduce computational complexity. The selection of monomials and the optimization of weights that conduct to the nonlinear projection are performed with a multi-modal Genetic Algorithm hybridized with Differential Evolution. By alleviating the drawbacks driven from the linearity assumptions in traditional Projection Pursuit, the new method could gain a wide applicability in both unsupervised and supervised data analysis.
2006 Eighth International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2006
There are two major ways to solve constraint satisfaction problems (CSPs) : inference approaches ... more There are two major ways to solve constraint satisfaction problems (CSPs) : inference approaches and search algo-rithms [4]. Inference methods derive and record new infor-mation in order to make the problem easier to solve. Search algorithms seek for a solution in the space of ...
Proceedings of the 13th annual conference on Genetic and evolutionary computation - GECCO '11, 2011
Clustering is a fundamental and hence widely studied problem in data analysis. In a multi-objecti... more Clustering is a fundamental and hence widely studied problem in data analysis. In a multi-objective perspective, this paper combines principles from two different clustering paradigms: the connectivity principle from density-based methods is integrated into the partitional clustering approach. The standard k-Means algorithm is hybridized with Particle Swarm Optimization. The new method (PSO-kMeans) benefits from both a local and a global
Lecture Notes in Computer Science, 2010
Abstract. Clustering analysis is an important step towards getting in-sight into new data. Ensemb... more Abstract. Clustering analysis is an important step towards getting in-sight into new data. Ensemble procedures have been designed in order to obtain improved partitions of a data set. Previous work in domain, mostly empirical, shows that accuracy and a limited diversity are manda- ...
Proceedings of the 11th Annual conference on Genetic and evolutionary computation - GECCO '09, 2009
... Crowding Genetic Algorithms Mihaela Breaban Faculty of Computer Science Alexandru Ioan Cuza U... more ... Crowding Genetic Algorithms Mihaela Breaban Faculty of Computer Science Alexandru Ioan Cuza University Iasi, Romania pmihaela@infoiasi.ro Henri Luchian Faculty of Computer Science Alexandru Ioan Cuza University Iasi, Romania hluchian@infoiasi.ro ...
2012 14th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 2012
ABSTRACT This paper proposes a new method to identify interesting structures in data based on the... more ABSTRACT This paper proposes a new method to identify interesting structures in data based on the projection pursuit methodology. Past work reported in literature uses projection pursuit methods as means to visualize high-dimensional data, or to identify linear combinations of attributes that reveal grouping tendencies or outliers. The framework of projection pursuit is generally formulated as an optimization problem aiming at finding projection axes that minimize/maximize a projection index. With regard to identifying interesting structure, the existing approaches suffer from obvious limitations: linear models are not able to catch more general structures in data like circular/curved clusters or any structure that is the result of a polynomial/nonlinear generative model. This paper extends linear projection pursuit to nonlinear projections while allowing at the same time for the preservation of the general methodology employed in the search of projections. In addition, an algorithmic framework based on multi-modal genetic algorithms is proposed in order to deal with the large search space and to allow for the use of non-differentiable projection indices. Experiments conducted on synthetic data demonstrate the ability of the new approach to identify clusters of various shapes that otherwise are undetectable with linear projection pursuit or popular clustering methods like k-Means.
2009 IEEE Congress on Evolutionary Computation, 2009
Abstract This paper is concerned with a problem in information organization and retrieval within ... more Abstract This paper is concerned with a problem in information organization and retrieval within Web communities. Most work in this domain is focused on reputation-based systems which exploit the experience gathered by previous users in order to evaluate resources at the community level. The current research focuses on a slightly different approach: a personalized evaluation system whose goal is to build a flexible and easy way to manage resources in a personalized manner. The functionality of such a model comes from local ...
Lecture Notes in Computer Science, 2006
There are two major ways to solve constraint satisfaction problems(CSPs) : inference approaches a... more There are two major ways to solve constraint satisfaction problems(CSPs) : inference approaches and search algorithms [1]. Inference approaches derive and record new information in order to make the problem easier to solve. Search algorithms seek for a solution in the space of ...
New Achievements in Evolutionary Computation, 2010
Source: New Achievements in Evolutionary Computation, Book edited by: Peter Korosec, ISBN 978-953... more Source: New Achievements in Evolutionary Computation, Book edited by: Peter Korosec, ISBN 978-953-307-053-7, pp. 318, February 2010, INTECH, Croatia, downloaded from SCIYO.COM
Pattern Recognition, 2011
Exploratory data analysis methods are essential for getting insight into data. Identifying the mo... more Exploratory data analysis methods are essential for getting insight into data. Identifying the most important variables and detecting quasi-homogenous groups of data are problems of interest in this context. Solving such problems is a difficult task, mainly due to the unsupervised nature of the underlying learning process. Unsupervised feature selection and unsupervised clustering can be successfully approached as optimization problems
Journal of Petroleum Science and Engineering, 2013
ABSTRACT In petroleum industry, the compressional acoustic or sonic log (DT) is commonly used as ... more ABSTRACT In petroleum industry, the compressional acoustic or sonic log (DT) is commonly used as a predictor because its capabilities respond to changes in porosity or compaction which, in turn, are further used to estimate formation (sonic) porosity, to map abnormal pore-fluid pressure, or to carry out petrophysical studies. Despite its intrinsic capabilities, the sonic log is not routinely recorded in during well logging. We propose using a method belonging to the class of supervised machine learning algorithms — Support Vector Regression (SVR) — to synthesize missing compressional acoustic or sonic (DT) logs when only common logs (e.g., natural gamma ray—GR, or deep resistivity—REID) are available.Our approach involves three steps: (1) supervised training of the model; (2) confirmation and validation of the model by blind-testing the results in wells containing both the predictor (GR, REID) and the target (DT) values used in the supervised training; and (3) application of the predicted model to wells containing the predictor data and obtaining the synthetic (simulated) DT log. SVR methodology offers two advantages over traditional deterministic methods: strong nonlinear approximation capabilities and good generalization effectiveness. These result from the use of kernel functions and from the structural risk minimization principle behind SVR. Unlike linear regression techniques, SVR does not overpredict mean values and thereby preserves original data variability. SVR also deals greatly with uncertainty associated with the data, the immense size of the data and the diversity of the data type. A case study from the Anadarko Basin, Oklahoma, about estimating the presence of abnormally pressurized pore-fluid zones by using synthesized DT values, is presented. The results are promising and encouraging.
The single-depot multiple TSP (SD-MTSP) is a simple extension of the standard TSP, in which more ... more The single-depot multiple TSP (SD-MTSP) is a simple extension of the standard TSP, in which more than one
salesman is allowed to visit the set of interconnected cities, such that each city is visited exactly once (by a single salesman) and the total cost of the traveled subtours is minimized. Although Ant Colony Systems (ACSs) are a natural choice for shortest-path problems, with TSP at its core, the application of ACS on this straightforward extension is not properly explored. The reasons may lie in the bi-criteria nature of the problem (shortest cost versus balanced subtours) and the lack of dedicated benchmarks exposing optimal solutions. This paper attempts at proposing and evaluating from a bi-criteria perspective several multiobjective ACSs to tackle SD-MTSP when two objectives need to be simultaneously optimized: minimizing the total cost of traveled subtours while achieving balanced subtours. Experiments are conducted towards investigating the efficiency of the algorithms in a multi-objective setting.
Clustering is a fundamental and hence widely studied problem in data analysis. In a multi-objecti... more Clustering is a fundamental and hence widely studied problem in data analysis. In a multi-objective perspective, this paper combines principles from two different clustering paradigms: the connectivity principle from density-based methods is integrated into the partitional clustering approach. The standard k-Means algorithm is hybridized with Particle Swarm Optimization. The new method (PSO-kMeans) benefits from both a local and a global view on data and alleviates some drawbacks of the k-Means algorithm; thus, it is able to spot types of clusters which are otherwise difficult to obtain (elongated shapes, non-similar volumes). Our experimental results show that PSO-kMeans improves the performance of standard k-Means in all test cases and performs at least comparable to state-of-the-art methods in the worst case. PSO-kMeans is robust to outliers. This comes at a cost: the preprocessing step for finding the nearest neighbors for each data item is required, which increases the initial linear complexity of k-Means to quadratic complexity.
The work in unsupervised learning centered on clustering has been extended with new paradigms to ... more The work in unsupervised learning centered on clustering has been extended with new paradigms to address the demands raised by real-world problems. In this regard, unsupervised feature selection has been proposed to remove noisy attributes that could mislead the clustering procedures. Additionally, semi-supervision has been integrated within existing paradigms because some background information usually exist in form of a reduced number of similarity/dissimilarity constraints. In this context, the current paper investigates a method to perform simultaneously feature selection and clustering. The benefits of a semi-supervised approach making use of reduced external information are highlighted against an unsupervised approach. The method makes use of an ensemble of near-optimal feature subsets delivered by a multi-modal genetic algorithm in order to quantify the relative importance of each feature to clustering.