Estimation of Distribution Algorithms Research Papers (original) (raw)

Multi-swarm systems base their search on multiple sub-swarms instead of one standard swarm. The use of diverse sub-swarms increases performance when optimizing multi-modal functions. However, new design decisions arise when implementing... more

Multi-swarm systems base their search on multiple sub-swarms instead of one standard swarm. The use of diverse sub-swarms increases performance when optimizing multi-modal functions. However, new design decisions arise when implementing multi-swarm systems such as how to select the initial positions and initial velocities, and how to coordinate the different sub-swarms. Starting from the relatively simple multiswarm system of locust swarms, ideas from differential evolution and estimation of distribution algorithms are used to address the new design considerations that are specific to multi-swarm systems. Experiments show that the new hybrid system can perform better than each of the individual components.

This paper describes an applied research work that looks at different ways to effectively manage resources. Particularly, it describes how revenue management techniques can be used to balance demand against capacity, and describes a... more

This paper describes an applied research work that looks at different ways to effectively manage resources. Particularly, it describes how revenue management techniques can be used to balance demand against capacity, and describes a system that uses different OR and AI techniques to intelligently price diverse products and services. This system can produce pricing policies for wide range of products and services regardless of the model of demand used. The system incorporates a model specification layer, which provides flexibility in defining the demand model for different products. It also incorporates an optimisation layer, which takes the specified model as an input and produces the pricing and production guidelines for the product. The system can be either used as a stand alone system or can be incorporated as a generic modelling and optimisation component within a larger revenue management system.

In this study 2-year and 100-year annual maximum daily precipitation for rainfall-runoff studies and estimating flood hazard were mapped. The daily precipitation measurements at 23 climate stations from 1961-2000 were used in the upper... more

In this study 2-year and 100-year annual maximum daily precipitation for rainfall-runoff studies and estimating flood hazard were mapped. The daily precipitation measurements at 23 climate stations from 1961-2000 were used in the upper Hron basin in central Slovakia. The choice of data preprocessing and interpolation methods was guided by their practical applicability and acceptance in the engineering hydrologic community. The main objective was to discuss the quality and properties of maps of design precipitation with a given return period with respect to the expectations of the end user. Four approaches to the preprocessing of annual maximum 24-hour precipitation data were used, and three interpolation methods employed. The first approach is the direct mapping of at-site estimates of distribution function quantiles; the second is the direct mapping of local estimates of the three parameters of the GEV distribution. In the third, the daily precipitation totals were interpolated into a regular grid network, and then the time series of the maximum daily precipitation totals in each grid point of the selected region were statistically analysed. In the fourth, the spatial distribution of the design precipitation was modeled by quantiles predicted by regional precipitation frequency analysis using the Hosking and Wallis procedure. The three interpolation methods used were the inverse distance weighting, nearest neighbor and the kriging method. Visual inspection and jackknife cross-validation were used to compare the combination of approaches.

Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and... more

Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain.

Most of the existing Data Mining algorithms have been manually produced, that is, have been developed by a human programmer. A prominent Artificial Intelligence research area is automatic programming-the generation of a computer program... more

Most of the existing Data Mining algorithms have been manually produced, that is, have been developed by a human programmer. A prominent Artificial Intelligence research area is automatic programming-the generation of a computer program by another computer program. Clustering is an important data mining task with many useful real-world applications. Particularly, the class of clustering algorithms based on the idea of data density to identify clusters has many advantages, such as the ability to identify arbitrary-shape clusters. We propose the use of Estimation of Distribution Algorithms for the artificial generation of density-based clustering algorithms. In order to guarantee the generation of valid algorithms, a directed acyclic graph (DAG) was defined where each node represents a procedure (building block) and each edge represents a possible execution sequence between two nodes. The Building Blocks DAG specifies the alphabet of the EDA, that is, any possibly generated algorithm. Preliminary experimental results compare the clustering algorithms artificially generated by AutoClustering to DBSCAN, a wellknown manually-designed algorithm.

This chapter addresses to the blocking flowshop scheduling problem with the aim of minimizing the makespan. An Estimation of Distribution Algorithm, followed by a local search procedure, after the step of creating a new individual, was... more

This chapter addresses to the blocking flowshop scheduling problem with the aim of minimizing the makespan. An Estimation of Distribution Algorithm, followed by a local search procedure, after the step of creating a new individual, was developed in order to solve this problem. Our comparisons were performed against representative approaches proposed in the literature related to the blocking flowshop scheduling problem. The obtained results have shown that the proposed algorithm is able to improve 109 out of 120 best known solutions of Taillard’s instances. Moreover, our algorithm outperforms all competing approaches in terms of solution quality and computational time.

This paper benchmarks the particle swarm optimizer with adaptive bounds algorithm (PSO Bounds) on the noisefree BBOB 2009 testbed. The algorithm is further augmented with a simple re-initialization mechanism that is invoked if the bounds... more

This paper benchmarks the particle swarm optimizer with adaptive bounds algorithm (PSO Bounds) on the noisefree BBOB 2009 testbed. The algorithm is further augmented with a simple re-initialization mechanism that is invoked if the bounds tend to overlap.

A procedure to estimate distributed daily evapotranspiration (ET) using remotely sensed data is presented. Landsat-TM data for a part of the Western Yamuna Canal command (Haryana) has been used for model application. The model utilizes... more

A procedure to estimate distributed daily evapotranspiration (ET) using remotely sensed data is presented. Landsat-TM data for a part of the Western Yamuna Canal command (Haryana) has been used for model application. The model utilizes the surface reflectance in visible, infrared and thermal bands to generate surface albedo, surface temperature and leaf area index and thus surface energy fluxes to

The generation and maintenance of distinct solutions in Multiobjective Evolutionary Algorithms (MOEAs), especially in dynamic environments in which the criteria for evaluating solutions may vary over time, is an open problem, in which... more

The generation and maintenance of distinct solutions in Multiobjective Evolutionary Algorithms (MOEAs), especially in dynamic environments in which the criteria for evaluating solutions may vary over time, is an open problem, in which there are few studies on the influence of the different ways to generate diversity in the quality of the optimal solutions set. The inclusion of diversity generators in MOEAs can increase the cost of the evolutionary process and impair their performance. Hence, it comes the need to seek ways of mitigating the negative impact of the rising levels of dispersion within the candidate solutions population in the road ahead to the surface where lie the optimal points, known as the Pareto Front (PF). In biological systems, immigration schemes increase the possible combinations of genetic exchanges, promoting diversity of evolutionary paths. Inspired by the natural models of immigration, this research investigates the inclusion of atypical solutions (immigrants) in populations of candidate solutions as a way to generate diversity in MOEAs applied to dynamic multiobjective optimization. This dissertation also proposes and formalizes the Non-Dominance Landscapes (NDL) to guide the insertion of the generated immigrants in the population. The NDLs provide MOEAs with the probabilities of the immigrants being non-dominated in the population, from the estimation of probability density functions and of multivariate order statistics in the objective space. After characterizing the influence of diversity in the pproximation dynamics of the PF in MOEAs, the NDLs have been incorporated into the immigrants generators. The experimental validation of the NDL-based Diversity Generator (NDL-DG) expresses the potential of the proposed approach in increasing the average quality of the evolved non-dominated solutions sets. The results analysis of the incorporation of the NDL-DG into the NSGA2 algorithm show that higher average quality solutions are obtained with statistical significance at 79% of the studied dynamic optimization scenarios, in terms of the offline Hypervolume indicator, when compared with populations evolved without the use of NDLs. We then identified the optimization scenarios in which the NDL-DG appears more promising. Finally, we indicated research irections to extend the range of application of NDLs to other open problems in evolutionary multiobjective optimization.

Estimation of Distribution Algorithms (EDAs) are a class of probabilistic model-building evolutionary algorithms, which are characterized by learning and sampling the probability distribution of the selected individuals. This paper... more

Estimation of Distribution Algorithms (EDAs) are a class of probabilistic model-building evolutionary algorithms, which are characterized by learning and sampling the probability distribution of the selected individuals. This paper proposes a modified EDA (mEDA) for digital filter design. mEDA uses a novel sampling method, called centro-individual sampling, and a fuzzy Cmeans clustering technique to improve its performance. Extensive experiments conducted on a set of benchmark functions show that mEDA outperforms seven algorithms reported in the literature, in terms of the quality of solutions. Four types of digital infinite impulse response (IIR) filters are designed by using mEDA and the results show that mEDA can obtain better filter performance than four state-of-the-art methods.

In this paper we develop a Self-guided Genetic Algorithm (Self-guided GA), which belongs to the category of Estimation of Distribution Algorithms (EDAs). Most EDAs explicitly use the probabilistic model to sample new solutions without... more

In this paper we develop a Self-guided Genetic Algorithm (Self-guided GA), which belongs to the category of Estimation of Distribution Algorithms (EDAs). Most EDAs explicitly use the probabilistic model to sample new solutions without using traditional genetic operators. EDAs make good use of the global statistical information collected from previous searches but they do not efficiently use the location information about individual solutions. It is recently realized that global statistical information and location information should complement each other during the evolution process. In view of this, we design the Self-guided GA based on a novel strategy to combine these two kinds of information. The Self-guided GA does not sample new solutions from the probabilistic model. Instead, it estimates the quality of a candidate offspring based on the probabilistic model used in its crossover and mutation operations. In such a way, the mutation and crossover operations are able to generate fitter solutions, thus improving the performance of the algorithm. We tested the proposed algorithm by applying it to deal with the NP-complete flowshop scheduling problem to minimize the makespan. The experimental results show that the Self-guided GA is very promising. We also demonstrate that the Self-guided GA can be easily extended to treat other intractable combinatorial problems.

This paper presents an extension to our work on estimating the probability distribution by using a Markov Random Field (MRF) model in an Estimation of Distribution Algorithm (EDA) [1]. We propose a method that directly samples a MRF model... more

This paper presents an extension to our work on estimating the probability distribution by using a Markov Random Field (MRF) model in an Estimation of Distribution Algorithm (EDA) [1]. We propose a method that directly samples a MRF model to generate new population. We also present a new EDA, called the Distribution Estimation Using MRF with direct sampling (DEUM d ), that uses this method, and iteratively refines the probability distribution to generate better solutions. Our experiments show that the direct sampling of a MRF model as estimation of distribution provides a significant advantage over other techniques on problems where a univariate EDA is typically used.

Regulatory Networks (GRNs) describe the interactions between different genes. One of the most important tasks in biology is to find the right regulations in a GRN given observed data. The problem, is that the data is often noisy and... more

Regulatory Networks (GRNs) describe the interactions between different genes. One of the most important tasks in biology is to find the right regulations in a GRN given observed data. The problem, is that the data is often noisy and scarce, and we have to use models robust to noise and scalable to hundreds of genes.
Recently, Recursive Neural Networks (RNNs) have been presented as a viable model for GRNs, which is robust to noise and can be scaled to larger networks. In this paper, to optimize the parameters of the RNN, we implement a classic Population Based Incremental Learning (PBIL), which in certain scenarios has outperformed classic GA and other evolutionary techniques like Particle Swarm Optimization (PSO). We test this implementation on a small and a large artificial networks. We further study the optimal tunning parameters and discuss the advantages of the method.

Regularized logistic regression is a useful classification method for problems with few samples and a huge number of variables. This regression needs to determine the regularization term, which amounts to searching for the optimal penalty... more

Regularized logistic regression is a useful classification method for problems with few samples and a huge number of variables. This regression needs to determine the regularization term, which amounts to searching for the optimal penalty parameter and the norm of the regression coefficient vector. This paper presents a new regularized logistic regression method based on the evolution of the regression coefficients using estimation of distribution algorithms. The main novelty is that it avoids the determination of the regularization term. The chosen simulation method of new coefficients at each step of the evolutionary process guarantees their shrinkage as an intrinsic regularization. Experimental results comparing the behavior of the proposed method with Lasso and ridge logistic regression in three cancer classification problems with microarray data are shown.

Anastomosing rivers form a subset of the anabranching family of river types and provide considerable challenges to modelling of their streamflow because of complex flow patterns across greatly varying floodplain widths. Estimates of... more

Anastomosing rivers form a subset of the anabranching family of river types and provide considerable challenges to modelling of their streamflow because of complex flow patterns across greatly varying floodplain widths. Estimates of distributed flow data are required for catchment management purposes and ecological studies of these rivers but are hindered by a paucity of measured discharge data. A grid-based, semi-distributed, conceptual model structure is applied to a 330 km reach of the arid zone Diamantina River of central Australia. Model complexity is constrained by data availability with only a single gauging station located at the downstream end of the reach to provide discharge data for model calibration. The model uses a simple conceptual bucket structure and accounts for exceptionally high transmission losses as well as flow patterns and wave speeds that vary with discharge within the reach. The intricate flow patterns across the floodplains widths of up to 50 km are simulated using a grid-based structure that required the following features: (i) cell connections that are explicitly defined using a code that allows for multi-directional flow from a cell; and (ii) each cell having a binary flow pattern, with the second connection pattern being triggered when the surface storage of the cell exceeds a calibrated level for a given land-type. Satellite images were used to define the flow paths, and hence cell connection patterns, utilised by various sized floods. The model was able to provide acceptable simulation of large floods but with decreasing model performance in the simulation of small to medium sized floods. Simulation suggested that incorrectly defined flow paths for the smaller floods were a major factor in this decreased performance. The capability of the model would be improved by further detailed mapping, using satellite imagery, of spatial patterns of inundation as discharge varies.

Psychological testing aims at making inferences about individual differences or the estimation of distributions of psychological constructs in groups of interest. However, a test instrument's relationship to the construct, the actual... more

Psychological testing aims at making inferences about individual differences or the estimation of distributions of psychological constructs in groups of interest. However, a test instrument's relationship to the construct, the actual variable of interest, may change across subpopulations, or the instrument's measurement accuracy is not the same across subpopulations. This paper introduces an extension of the mixture distribution general diagnostic model (GDM) that allows studying the population dependency of multidimensional latent trait models across observed and latent populations. Note that so-called diagnostic models do not aim at diagnosing individual test takers in the sense of a clinical diagnosis, or an extended case-based examination using multiple test instruments. The term cognitive diagnosis was coined following the development of models that attempt to identify (diagnose?) more than a single skill dimension. The GDM is a general modeling framework for confirmatory multidimensional item response models and includes well-known models such as item response theory (IRT), latent class analysis (LCA), and located latent class models as special cases. The hierarchical extensions of the GDM presented in this paper enable one to check the impact of clustered data, such as data from students with different native language background taking an English language test, on the structural parameter estimates of the GDM. Moreover, the hierarchical version of the GDM allows the examination of differences in skill distributions across these clusters.

The research literature on metaheuristic and evolutionary computation has proposed a large number of algorithms for the solution of challenging real-world optimization problems. It is often not possible to study theoretically the... more

The research literature on metaheuristic and evolutionary computation has proposed a large number of algorithms for the solution of challenging real-world optimization problems. It is often not possible to study theoretically the performance of these algorithms unless significant assumptions are made on either the algorithm itself or the problems to which it is applied, or both. As a consequence, metaheuristics are typically evaluated empirically using a set of test problems. Unfortunately, relatively little attention has been given to the development of methodologies and tools for the large-scale empirical evaluation and/or comparison of metaheuristics.

Rough Sets Theory has opened new trends for the development of the Incomplete Information Theory. Inside this one, the notion of reduct is a very significant one, but to obtain a reduct in a decision system is an expensive computing... more

Rough Sets Theory has opened new trends for the development of the Incomplete Information Theory. Inside this one, the notion of reduct is a very significant one, but to obtain a reduct in a decision system is an expensive computing process although very important in data analysis and knowledge discovery. Because of this, it has been necessary the development of different variants to calculate reducts. The present work look into the utility that offers Rough Sets Model and Information Theory in feature selection and a new method is presented with the purpose of calculate a good reduct. This new method consists of a greedy algorithm that uses heuristics to work out a good reduct in acceptable times. In this paper we propose other method to find good reducts, this method combines elements of Genetic Algorithm with Estimation of Distribution Algorithms. The new methods are compared with others which are implemented inside Pattern Recognition and Ant Colony Optimization Algorithms and the results of the statistical tests are shown.

cracking behavior of this SCC system, the influence of local stress on the crack initiation should be taken into account.

Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and... more

Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain.

We present the idea of an application of the mixtures of Erlang distributions in the construction of the recombination mechanism in estimation of distribution algorithms. We analyze main properties of Erlang mixtures and define a new... more

We present the idea of an application of the mixtures of Erlang distributions in the construction of the recombination mechanism in estimation of distribution algorithms. We analyze main properties of Erlang mixtures and define a new Erlang Mixture Estimation of Distribution Algorithm. We try to compare the efficiencies of ErM-EDA and evolutionary strategy in case of large populations. Some experimental results are presented after simple theoretical studies.

In order to improve an active type dosemeter for protection dosimetry against high-energy neutrons, characteristic response of an electronic personal dosemeter consisting of Si semiconductor detector covered with polyethylene radiator has... more

In order to improve an active type dosemeter for protection dosimetry against high-energy neutrons, characteristic response of an electronic personal dosemeter consisting of Si semiconductor detector covered with polyethylene radiator has been checked in monoenergetic neutron fields of 14.8 and 65 MeV. Pulse height distributions have been measured for several different radiator thicknesses. An analytical estimation of distributions in energy dissipated in depletion layer of Si detector has been carried out under simplified conditions. A good agreement was confirmed between both distributions in the shape and its variation with the radiator thickness and the neutron energy. The energy dependence of the dosemeter response has also been discussed. It is suggested that the energy dependence could be compensated by introducing the two-window technique with different discrimination levels on the pulse height.

Standardmethoden zur Schätzung von Disparitätsmaßen aus klassierten Daten basieren entweder auf der Bestimmung von Schranken, die den wahren Wert des jeweiligen Disparitätsmaßes einschließen (nichtparametrischer Ansatz) oder aber auf... more

Standardmethoden zur Schätzung von Disparitätsmaßen aus klassierten Daten basieren entweder auf der Bestimmung von Schranken, die den wahren Wert des jeweiligen Disparitätsmaßes einschließen (nichtparametrischer Ansatz) oder aber auf Annahmen bezüglich der den Daten zugrunde liegenden Verteilung, deren Parameter geschätzt werden müssen (parametrischer Ansatz). Die Parameterschätzung kann je nach angenommener Verteilung numerisch aufwendig sein und es ist nicht in jedem Fall gesichert, dass diese Verteilung eine gute Anpassung an die Daten darstellt. Die Bestimmung der Schranken ist hingegen nur dann sinnvoll, wenn diese nahe genug beieinander liegen (dies ist zumeist nur bei Vorliegen einer größeren Anzahl von Klassen der Fall). In diesem Beitrag wird die Schatzung von Disparitätsmaßen mittels Bestimmung von entropiemaximalen Dichtefunktionen dargestellt. Dabei wird in jeder Klasse die Entropie der geschätzten Dichtefunktion maximiert. Die durchgeführte Simulationsstudie bestätigt e...

Decision making in the presence of multiple and conflicting objectives requires preference from the decision maker. The decision maker's preferences give rise to a domination structure. Till now, most of the research has been focussed on... more

Decision making in the presence of multiple and conflicting objectives requires preference from the decision maker. The decision maker's preferences give rise to a domination structure. Till now, most of the research has been focussed on the standard domination structure based on the Pareto-domination principle. However, various real world applications like medical image registration, financial applications, multicriteria n-person games, among others, or even the preference model of decision makers frequently give rise to a so-called variable domination structure, in which the domination itself changes from point to point. Although variable domination is studied in the classical community since the early seventies, we could not find a single study in the evolutionary domain, even though, as the results of this paper show, multi-objective evolutionary algorithms can deal with the vagaries of a variable domination structure. The contributions of this paper are multiple-folds. Firstly, the algorithms are shown to be able to find a well-diverse set of the optimal solutions satisfying a variable domination structure. This is shown by simulation results on a number of test-problems. Secondly, it answers a hitherto open question in the classical community to develop a numerical method for finding a well-diverse set of such solutions. Thirdly, theoretical results are derived which facilitate the use of an evolutionary multi-objective algorithm. The theoretical results are of importance on their own. The results of this paper adequately show the niche of multi-objective evolutionary algorithms in variable preference modeling.

Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and... more

Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain.

Electrical distribution utilities have been dealing with the problem of estimation of distribution network load diagrams, either for operation studies or in forecasting models for planning purposes. Load curve assessment is essential for... more

Electrical distribution utilities have been dealing with the problem of estimation of distribution network load diagrams, either for operation studies or in forecasting models for planning purposes. Load curve assessment is essential for an efficient management of electric distribution systems. However, the only information available for most of the loads (namely LV loads) is related to monthly energy consumptions. The general procedure uses measurements in consumers to construct inference engines that predict load curves using commercial information. This paper presents a new approach for this problem, based on Kohonen maps and Artificial Neural Networks (ANN) to estimate load diagramsfor the portuguese distribution utilities. A method for estimating error bars is also proposed in order to provide a high order information about the performance of load curve estimation process. Performance attained is discussed as well as the method to achieve confidence intervals of the main predicted diagrams.

Logistic regression is a simple and efficient supervised learning algorithm for estimating the probability of an outcome or class variable. In spite of its simplicity, logistic regression has shown very good performance in a range of... more

Logistic regression is a simple and efficient supervised learning algorithm for estimating the probability of an outcome or class variable. In spite of its simplicity, logistic regression has shown very good performance in a range of fields. It is widely accepted in a range of fields because its results arc easy to interpret. Fitting the logistic regression model usually involves using the principle of maximum likelihood. The Newton-Raphson algorithm is the most common numerical approach for obtaining the coefficients maximizing the likelihood of the data. This work presents a novel approach for fitting the logistic regression model based on estimation of distribution algorithms (EDAs), a tool for evolutionary computation. EDAs are suitable not only for maximizing the likelihood, but also for maximizing the area under the receiver operating characteristic curve (AUC). Thus, wc tackle the logistic regression problem from a double perspective: likelihood-based lo calibrate the model and AUC-bascd to discriminate between the different classes, Under tliese two objectives of calibration and discrimination, the Pareto front can he obtained in our EDA framework. These fronts are compared with Ihosc yielded by a multiobjcclivc EDA recently introduced in the literature.

One of the key points in Estimation of Distribution Algorithms (EDAs) is the learning of the probabilistic graphical model used to guide the search: the richer the model the more complex the learning task. Dependency networksbased EDAs... more

One of the key points in Estimation of Distribution Algorithms (EDAs) is the learning of the probabilistic graphical model used to guide the search: the richer the model the more complex the learning task. Dependency networksbased EDAs have been recently introduced. On the contrary of Bayesian networks, dependency networks allow the presence of directed cycles in their structure. In a previous work the authors proposed EDNA, an EDA algorithm in which a multivariate dependency network is used but approximating its structure learning by considering only bivariate statistics. EDNA was compared with other models from the literature with the same computational complexity (e.g., univariate and bivariate models). In this work we propose a modified version of EDNA in which not only the structural learning phase is limited to bivariate statistics, but also the simulation and the parameter learning task. Now, we extend the comparison employing multivariate models based on Bayesian networks (EBNA and hBOA). Our experiments show that the modified EDNA is more accurate than the original one, being its accuracy comparable to EBNA and hBOA, but with the advantage of being faster specially in the more complex cases.

Industrial deployment of academic real-time techniques still struggles to gain momentum due to the non-familiarity of the industry with schedulability analysis, as well as the lack of appropriate commercial tools. More over, it is... more

Industrial deployment of academic real-time techniques still struggles to gain momentum due to the non-familiarity of the industry with schedulability analysis, as well as the lack of appropriate commercial tools. More over, it is imperative that academia realises the extent of pessimism in the proposed techniques, which often makes them less attractive to systems developers.

In this work we propose an estimation of distribution algorithm (EDA) as a new tool aiming at minimizing the total flowtime in permutation flowshop scheduling problems. A variable neighbourhood search is added to the algorithm as an... more

In this work we propose an estimation of distribution algorithm (EDA) as a new tool aiming at minimizing the total flowtime in permutation flowshop scheduling problems. A variable neighbourhood search is added to the algorithm as an improvement procedure after creating a new offspring. The experiments show that our approach outperforms all existing techniques employed for the problem and can provide new upper bounds.

Nummenmaa, A., Auranen, T., Hämäläinen, M. S., Jääskeläinen, I. P., Sams, M., Vehtari, A., and Lampinen, J. (2007). Automatic relevance determination based hierarchical Bayesian MEG inversion in practice. NeuroImage, 37 (3): 876-889.

This paper presents a methodology for using heuristic search methods to optimise cancer chemotherapy. Specifically, two evolutionary algorithms -Population Based Incremental Learning (PBIL), which is an Estimation of Distribution... more

This paper presents a methodology for using heuristic search methods to optimise cancer chemotherapy. Specifically, two evolutionary algorithms -Population Based Incremental Learning (PBIL), which is an Estimation of Distribution Algorithm (EDA), and Genetic Algorithms (GAs) have been applied to the problem of finding effective chemotherapeutic treatments. To our knowledge, EDAs have been applied to fewer real world problems compared to GAs, and the aim of the present paper is to expand the application domain of this technique.

It is an evolutionary approach, and used when the network size grows and the search space increases. When the destination is outside the zone, EDA is applied to find the route with minimum cost and time. The implementation of proposed... more

It is an evolutionary approach, and used when the network size grows and the search space increases. When the destination is outside the zone, EDA is applied to find the route with minimum cost and time. The implementation of proposed method is compared with Genetic ZRP, i.e., GZRP and the result demonstrates better performance for the proposed method. Since the method provides a set of paths to the destination, it results in load balance to the network. As both EDA and GA use random search method to reach the optimal point, the searching cost reduced significantly, especially when the number of data is large.

This paper proposes a new memetic evolutionary algorithm to achieve explicit learning in rule-based nurse rostering, which involves applying a set of heuristic rules for each nurse’s assignment. The main framework of the algorithm is an... more

This paper proposes a new memetic evolutionary algorithm to achieve explicit learning in rule-based nurse rostering, which involves applying a set of heuristic rules for each nurse’s assignment. The main framework of the algorithm is an estimation of distribution algorithm, in which an ant-miner methodology improves the individual solutions produced in each generation. Unlike our previous work (where learning is implicit), the learning in the memetic estimation of distribution algorithm is explicit, that is, we are able to identify building
blocks directly. The overall approach learns by building a probabilistic model, that is, an estimation of the
probability distribution of individual nurse–rule pairs that are used to construct schedules. The local search
processor (ie the ant-miner) reinforces nurse–rule pairs that receive higher rewards. A challenging real-world
nurse rostering problem is used as the test problem. Computational results show that the proposed approach
outperforms most existing approaches. It is suggested that the learning methodologies suggested in this paper
may be applied to other scheduling problems where schedules are built systematically according to specific rules.

The importance of microorganisms and biotechnology in space exploration and future planets colonization has been discussed in the literature. Meteorites are interesting samples to study microbe-mineral interaction focused on space... more

The importance of microorganisms and biotechnology in space exploration and future planets colonization has been discussed in the literature. Meteorites are interesting samples to study microbe-mineral interaction focused on space exploration. The chemolithotropic bacterium Acidithiobacillus ferrooxidans has been used as model to understand the iron and sulfur oxidation. In this work, capillary electrophoresis with capacitively coupled contactless conductometric and UV detector was used to monitor bacterial growth in a meteorite simulant by measuring the conversion of Fe 2+ into Fe +3. The effect of Co 2+ and Ni 2+ (metals also found in meteorites) on the bacterial growth was also evaluated. The presented method allowed the analysis of all metals in a single run (less than 8 min). The background electrolyte was composted of 10 mmol L-1 HIBA/His. For comparison purpose, the samples were also analyzed by UV-Vis spectrophotometry. The Fe 2+ conversion into Fe 3+ by A. ferrooxidans was observed up to 36 hours with the growth rate constant of 0.19 h-1 and 0.21 h-1 in T&K and in meteorite simulant media, respectively. The developed method presents favorable prospect to monitor the growth of other chemolithotropic microorganisms, for example, for biotechnology applications.

Identification and Estimation of Distributional Impacts of Interventions Using Changes in Inequality Measures * This paper presents semiparametric estimators of distributional impacts of interventions (treatment) when selection to the... more

Identification and Estimation of Distributional Impacts of Interventions Using Changes in Inequality Measures * This paper presents semiparametric estimators of distributional impacts of interventions (treatment) when selection to the program is based on observable characteristics. Distributional impacts of a treatment are calculated as differences in inequality measures of the potential outcomes of receiving and not receiving the treatment. These differences are called "Inequality Treatment Effects" (ITE). The estimation procedure involves a first nonparametric step in which the probability of receiving treatment given covariates, the propensity-score, is estimated. In the second step weighted sample versions of inequality measures are computed using weights based on the estimated propensity-score. Root-N consistency, asymptotic normality, semiparametric efficiency and validity of inference based on the bootstrap are shown for the semiparametric estimators proposed. In addition of being easily implementable and computationally simple, results from a Monte Carlo exercise reveal that its good relative performance in small samples is robust to changes in the distribution of latent selection variables. Finally, as an illustration of the method, we apply the estimator to a real data set collected for the evaluation of a job training program, using several popular inequality measures to capture distributional impacts of the program.

Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and... more

Evolutionary search algorithms have become an essential asset in the algorithmic toolbox for solving high-dimensional optimization problems in across a broad range of bioinformatics problems. Genetic algorithms, the most well-known and representative evolutionary search technique, have been the subject of the major part of such applications. Estimation of distribution algorithms (EDAs) offer a novel evolutionary paradigm that constitutes a natural and attractive alternative to genetic algorithms. They make use of a probabilistic model, learnt from the promising solutions, to guide the search process. In this paper, we set out a basic taxonomy of EDA techniques, underlining the nature and complexity of the probabilistic model of each EDA variant. We review a set of innovative works that make use of EDA techniques to solve challenging bioinformatics problems, emphasizing the EDA paradigm's potential for further research in this domain.

Estimation of distribution algorithms (EDAs) are a wide-ranging family of evolutionary algorithms whose common feature is the way they evolve by learning a probability distribution from the best individuals in a population and sampling it... more

Estimation of distribution algorithms (EDAs) are a wide-ranging family of evolutionary algorithms whose common feature is the way they evolve by learning a probability distribution from the best individuals in a population and sampling it to generate the next one. Although they have been widely applied to solve combinatorial optimization problems, there are also extensions that work with continuous variables. In this paper [this paper is an extended version of delaOssa et al. Initial approaches to the application of islands-based parellel EDAs in continuous domains, in:

This paper presents a methodology for using heuristic search methods to optimise cancer chemotherapy. Specifically, two evolutionary algorithms-Population Based Incremental Learning (PBIL), which is an Estimation of Distribution Algorithm... more

This paper presents a methodology for using heuristic search methods to optimise cancer chemotherapy. Specifically, two evolutionary algorithms-Population Based Incremental Learning (PBIL), which is an Estimation of Distribution Algorithm (EDA), and Genetic Algorithms (GAs) have been applied to the problem of finding effective chemotherapeutic treatments. To our knowledge, EDAs have been applied to fewer real world problems compared to GAs, and the aim of the present paper is to expand the application domain of this technique. We compare and analyse the performance of both algorithms and draw a conclusion as to which approach to cancer chemotherapy optimisation is more efficient and helpful in the decision-making activity led by the oncologists.

Industrial deployment of academic real-time techniques still struggles to gain momentum due to the non-familiarity of the industry with schedulability analysis, as well as the lack of appropriate commercial tools. More over, it is... more

Industrial deployment of academic real-time techniques still struggles to gain momentum due to the non-familiarity of the industry with schedulability analysis, as well as the lack of appropriate commercial tools. More over, it is imperative that academia realises the extent of pessimism in the proposed techniques, which often makes them less attractive to systems developers.

The paper proposes a new evolutionary algorithm for composite laminate optimization, named Double-Distribution Optimization Algorithm (DDOA). The DDOA belongs to the family of Estimation of Distributions Algorithms (EDA) that build a... more

The paper proposes a new evolutionary algorithm for composite laminate optimization, named Double-Distribution Optimization Algorithm (DDOA). The DDOA belongs to the family of Estimation of Distributions Algorithms (EDA) that build a statistical model of promising regions of the design space based on sets of good points, and use it to guide the search. A generic framework for introducing variable dependencies by making use of the physics of the problem is presented. The algorithm uses two distributions simultaneously: a simple distribution for the design variables, complemented by the distribution of auxiliary variables. The combination of the two generates complex distributions at a low computational cost. The paper demonstrates DDOA's efficiency for two laminate optimization problems for which the design variables are the fiber angles and the auxiliary variables are the lamination parameters. The results show that its reliability in finding the optima is greater than that of a simple EDA and of a standard GA, and that its superiority increases with the problem dimension.

This research extends conventional Estimation of Distribution Algorithms (EDA) to Genetic Programming (GP) domain. We propose a framework to estimate the distribution of solutions in tree form. The core of this framework is a grammar... more

This research extends conventional Estimation of Distribution Algorithms (EDA) to Genetic Programming (GP) domain. We propose a framework to estimate the distribution of solutions in tree form. The core of this framework is a grammar model. In this research, we show, both theoretically and experimentally, that a grammar model has many of the properties we need for estimation of distribution

In this paper, we propose an estimation of distribution algorithm based on an inexpensive Gaussian mixture model with online learning, which will be employed in dynamic optimization. Here, the mixture model stores a vector of sufficient... more

In this paper, we propose an estimation of distribution algorithm based on an inexpensive Gaussian mixture model with online learning, which will be employed in dynamic optimization. Here, the mixture model stores a vector of sufficient statistics of the best solutions, which is subsequently used to obtain the parameters of the Gaussian components. This approach is able to incorporate into the current mixture model potentially relevant information of the previous and current iterations. The online nature of the proposal is desirable in the context of dynamic optimization, where prompt reaction to new scenarios should be promoted. To analyze the performance of our proposal, a set of dynamic optimization problems in continuous domains was considered with distinct levels of complexity, and the obtained results were compared to the results produced by other existing algorithms in the dynamic optimization literature.

1] Advanced land surface models (LSMs) offer detailed estimates of distributed hydrological fluxes and storages. These estimates are extremely valuable for studies of climate and water resources, but they are difficult to verify as field... more

1] Advanced land surface models (LSMs) offer detailed estimates of distributed hydrological fluxes and storages. These estimates are extremely valuable for studies of climate and water resources, but they are difficult to verify as field measurements of soil moisture, evapotranspiration, and surface and subsurface runoff are sparse in most regions. In contrast, river discharge is a hydrologic flux that is recorded regularly and with good accuracy for many of the world's major rivers. These measurements of discharge spatially integrate all upstream hydrological processes. As such, they can be used to evaluate distributed LSMs, but only if the simulated runoff is properly routed through the river basins. In this study, a rapid, computationally efficient source-to-sink (STS) routing scheme is presented that generates estimates of river discharge at gauge locations based on gridded runoff output. We applied the scheme as a postprocessor to archived output of the Global Land Data Assimilation System (GLDAS). GLDAS integrates satellite and ground-based data within multiple offline LSMs to produce fields of land surface states and fluxes. The application of the STS routing scheme allows for evaluation of GLDAS products in regions that lack distributed in situ hydrological measurements. We found that the four LSMs included in GLDAS yield very different estimates of river discharge and that there are distinct geographic patterns in the accuracy of each model as evaluated against gauged discharge. The choice of atmospheric forcing data set also had a significant influence on the accuracy of simulated discharge.