Marcus Gallagher | The University of Queensland, Australia (original) (raw)
Papers by Marcus Gallagher
Evolutionary algorithms perform optimization using a population of sample solution points. An int... more Evolutionary algorithms perform optimization using a population of sample solution points. An interesting development has been to view population-based optimization as the process of evolving an explicit, probabilistic model of the search space. This paper investigates a formal basis ...
Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, Jul 8, 2009
This paper presents some new analytical results on the continuous Univariate Marginal Distributio... more This paper presents some new analytical results on the continuous Univariate Marginal Distribution Algorithm (UMDA C ), which is a well known Estimation of Distribution Algorithm based on Gaussian distributions. As the extension of the current theoretical work built on the assumption of infinite populations, the convergence behavior of UMDA C with finite populations is formally analyzed. We show both analytically and experimentally that, on flat landscapes, the Gaussian model in UMDA C tends to collapse with high probability, which is an important fact that is not well understood before.
Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, 2009
ABSTRACT Success rate is a commonly adopted performance criterion for evaluating Evolutionary Alg... more ABSTRACT Success rate is a commonly adopted performance criterion for evaluating Evolutionary Algorithms due to their inherent randomness. However, the classical large-sample binomial test based on normal distributions is only valid with a relatively large number of trials, which ...
Proceedings of the 15th Pacific Asia Conference on Advances in Knowledge Discovery and Data Mining Volume Part Ii, 2011
Time series discord has proven to be a useful concept for time-series anomaly identification. To ... more Time series discord has proven to be a useful concept for time-series anomaly identification. To search for discords, various algorithms have been developed. Most of these algorithms rely on prebuilding an index (such as a trie) for subsequences. Users of these algorithms are typically required to choose optimal values for word-length and/or alphabet-size parameters of the index, which are not intuitive. In this paper, we propose an algorithm to directly search for the top-K discords, without the requirement of building an index or tuning external parameters. The algorithm exploits quasi-periodicity present in many time series. For quasi-periodic time series, the algorithm gains significant speedup by reducing the number of calls to the distance function.
2005 Ieee Congress on Evolutionary Computation Vols 1 3 Proceedings, Oct 2, 2005
Choosing the best parameter setting is a wellknown important and challenging task in Evolutionary... more Choosing the best parameter setting is a wellknown important and challenging task in Evolutionary Algorithms (EAs). As one of the earliest parameter tuning techniques, the Meta-EA approach regards each parameter as a variable and the performance of algorithm as the fitness value and conducts searching on this landscape using various genetic operators. However, there are some inherent issues in this method. For example, some algorithm parameters are generally not searchable because it is difficult to define any sensible distance metric on them. In this paper, a novel approach is proposed by combining the Meta-EA approach with a method called Racing, which is based on the statistical analysis of algorithm performance with different parameter settings. A series of experiments are conducted to show the reliability and efficiency of this hybrid approach in tuning Genetic Algorithms (GAs) on two benchmark problems.
2007 Ieee Congress on Evolutionary Computation Vols 1 10 Proceedings, Sep 1, 2007
Algorithms and the Cross-Entropy method use probabilistic modelling and inference to generate can... more Algorithms and the Cross-Entropy method use probabilistic modelling and inference to generate candidate solutions in optimization problems. The model fitting task in this class of algorithms has largely been carried out to date based on maximum likelihood. An alternative approach that is prevalent in statistics and machine learning is to use Bayesian inference. In this paper, we provide a framework for the application of Bayesian inference techniques in probabilistic model-based optimization. Based on this framework, a simple continuous Bayesian Estimation of Distribution Algorithm is described. We evaluate and compare this algorithm experimentally with its maximum likelihood equivalent, UMDA G c .
Lecture Notes in Computer Science, 2004
In empirical studies of Evolutionary Algorithms, it is usually desirable to evaluate and compare ... more In empirical studies of Evolutionary Algorithms, it is usually desirable to evaluate and compare algorithms using as many different parameter settings and test problems as possible, in order to have a clear and detailed picture of their performance. Unfortunately, the total number of experiments required may be very large, which often makes such research work computationally prohibitive. In this paper, the application of a statistical method called racing is proposed as a general-purpose tool to reduce the computational requirements of large-scale experimental studies in evolutionary algorithms. Experimental results are presented that show that racing typically requires only a small fraction of the cost of an exhaustive experimental study.
2006 IEEE International Conference on Evolutionary Computation, 2006
This paper presents some initial attempts to mathematically model the dynamics of a continuous Es... more This paper presents some initial attempts to mathematically model the dynamics of a continuous Estimation of Distribution Algorithm (EDA) based on Gaussian distributions. Case studies are conducted on both unimodal and multimodal problems to highlight the effectiveness of the proposed technique and explore some fundamental issues of the EDA. With some general assumptions, we can show that, for one-dimensional unimodal problems and with the (µ, λ) scheme:
2005 IEEE Congress on Evolutionary Computation, 2005
A comprehensive set of experiments was conducted with a continuous EDA on 25 test problems provid... more A comprehensive set of experiments was conducted with a continuous EDA on 25 test problems provided in the real-parameter optimization special session. It is expected that the results presented here could be used to gain some deeper understanding of the performance of the EDA as well as facilitate the comparison across different algorithms.
In this paper, we address some issues related to evaluating and testing evolutionary algorithms. ... more In this paper, we address some issues related to evaluating and testing evolutionary algorithms. A landscape generator based on Gaussian functions is proposed for generating a variety of continuous landscapes as fitness functions. Through some initial experiments, we illustrate the usefulness of this landscape generator in testing evolutionary algorithms.
Proceedings of the 2005 conference on Genetic and evolutionary computation - GECCO '05, 2005
The development of Estimation of Distribution Algorithms (EDAs) has largely been driven by using ... more The development of Estimation of Distribution Algorithms (EDAs) has largely been driven by using more and more complex statistical models to approximate the structure of search space. However, there are still problems that are difficult for EDAs even with models capable of capturing high order dependences. In this paper, we show that diversity maintenance plays an important role in the performance of EDAs. A continuous EDA based on the Cholesky decomposition is tested on some well-known difficult benchmark problems to demonstrate how different diversity maintenance approaches could be applied to substantially improve its performance.
This chapter presents a novel framework for tuning the parameters of Evolutionary Algorithms. A h... more This chapter presents a novel framework for tuning the parameters of Evolutionary Algorithms. A hybrid technique combining Meta-EAs and statistical Racing approaches is developed, which is not only capable of effectively exploring the search space of numerical parameters but also suitable for tuning symbolic parameters where it is generally difficult to define any sensible distance metric.
Algorithms and the Cross-Entropy method use probabilistic modelling and inference to generate can... more Algorithms and the Cross-Entropy method use probabilistic modelling and inference to generate candidate solutions in optimization problems. The model fitting task in this class of algorithms has largely been carried out to date based on maximum likelihood. An alternative approach that is prevalent in statistics and machine learning is to use Bayesian inference. In this paper, we provide a framework for the application of Bayesian inference techniques in probabilistic model-based optimization. Based on this framework, a simple continuous Bayesian Estimation of Distribution Algorithm is described. We evaluate and compare this algorithm experimentally with its maximum likelihood equivalent, UMDA G c .
Diagnosis Related Group (DRG) upcoding is an anomaly in healthcare data that costs hundreds of mi... more Diagnosis Related Group (DRG) upcoding is an anomaly in healthcare data that costs hundreds of millions of dollars in many developed countries. DRG upcoding is typically detected through resource intensive auditing. As supervised modeling of DRG upcoding is severely constrained by scope and timeliness of past audit data, we propose in this paper an unsupervised algorithm to filter data for potential identification of DRG upcoding. The algorithm has been applied to a hip replacement/revision dataset and a heart-attack dataset. The results are consistent with the assumptions held by domain experts.
Time series discord has proven to be a useful concept for time-series anomaly identification. To ... more Time series discord has proven to be a useful concept for time-series anomaly identification. To search for discords, various algorithms have been developed. Most of these algorithms rely on prebuilding an index (such as a trie) for subsequences. Users of these algorithms are typically required to choose optimal values for word-length and/or alphabet-size parameters of the index, which are not intuitive. In this paper, we propose an algorithm to directly search for the top-K discords, without the requirement of building an index or tuning external parameters. The algorithm exploits quasi-periodicity present in many time series. For quasi-periodic time series, the algorithm gains significant speedup by reducing the number of calls to the distance function.
Journal of Computer Science and Technology, 2013
ABSTRACT Time-series discord is widely used in data mining applications to characterize anomalous... more ABSTRACT Time-series discord is widely used in data mining applications to characterize anomalous subsequences in time series. Compared to some other discord search algorithms, the direct search algorithm based on the recurrence plot shows the advantage of being fast and parameter free. The direct search algorithm, however, relies on quasi-periodicity in input time series, an assumption that limits the algorithm's applicability. In this paper, we eliminate the periodicity assumption from the direct search algorithm by proposing a reference function for subsequences and a new sampling strategy based on the reference function. These measures result in a new algorithm with improved efficiency and robustness, as evidenced by our empirical evaluation.
IEEE Transactions on Evolutionary Computation, 2000
The research literature on metaheuristic and evolutionary computation has proposed a large number... more The research literature on metaheuristic and evolutionary computation has proposed a large number of algorithms for the solution of challenging real-world optimization problems. It is often not possible to study theoretically the performance of these algorithms unless significant assumptions are made on either the algorithm itself or the problems to which it is applied, or both. As a consequence, metaheuristics are typically evaluated empirically using a set of test problems. Unfortunately, relatively little attention has been given to the development of methodologies and tools for the large-scale empirical evaluation and/or comparison of metaheuristics.
Evolutionary Computation, 2005
Evolutionary algorithms perform optimization using a population of sample solution points. An int... more Evolutionary algorithms perform optimization using a population of sample solution points. An interesting development has been to view population-based optimization as the process of evolving an explicit, probabilistic model of the search space. This paper investigates a formal basis for continuous, population-based optimization in terms of a stochastic gradient descent on the Kullback-Leibler divergence between the model probability density and the objective function, represented as an unknown density of assumed form. This leads to an update rule that is related and compared with previous theoretical work, a continuous version of the population-based incremental learning algorithm, and the generalized mean shift clustering framework. Experimental results are presented that demonstrate the dynamics of the new algorithm on a set of simple test problems.
A major challenge in the development of peptidebased vaccines is finding the right immunogenic el... more A major challenge in the development of peptidebased vaccines is finding the right immunogenic element, with efficient and long-lasting immunisation effects, from large potential targets encoded by pathogen genomes. Computer models are convenient tools for scanning pathogen genomes to pre-select candidate immunogenic peptides for experimental validation. Current methods predict many false positives resulting from a low prevalence of true positives.
Evolutionary algorithms perform optimization using a population of sample solution points. An int... more Evolutionary algorithms perform optimization using a population of sample solution points. An interesting development has been to view population-based optimization as the process of evolving an explicit, probabilistic model of the search space. This paper investigates a formal basis ...
Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, Jul 8, 2009
This paper presents some new analytical results on the continuous Univariate Marginal Distributio... more This paper presents some new analytical results on the continuous Univariate Marginal Distribution Algorithm (UMDA C ), which is a well known Estimation of Distribution Algorithm based on Gaussian distributions. As the extension of the current theoretical work built on the assumption of infinite populations, the convergence behavior of UMDA C with finite populations is formally analyzed. We show both analytically and experimentally that, on flat landscapes, the Gaussian model in UMDA C tends to collapse with high probability, which is an important fact that is not well understood before.
Proceedings of the 11th Annual Conference on Genetic and Evolutionary Computation, 2009
ABSTRACT Success rate is a commonly adopted performance criterion for evaluating Evolutionary Alg... more ABSTRACT Success rate is a commonly adopted performance criterion for evaluating Evolutionary Algorithms due to their inherent randomness. However, the classical large-sample binomial test based on normal distributions is only valid with a relatively large number of trials, which ...
Proceedings of the 15th Pacific Asia Conference on Advances in Knowledge Discovery and Data Mining Volume Part Ii, 2011
Time series discord has proven to be a useful concept for time-series anomaly identification. To ... more Time series discord has proven to be a useful concept for time-series anomaly identification. To search for discords, various algorithms have been developed. Most of these algorithms rely on prebuilding an index (such as a trie) for subsequences. Users of these algorithms are typically required to choose optimal values for word-length and/or alphabet-size parameters of the index, which are not intuitive. In this paper, we propose an algorithm to directly search for the top-K discords, without the requirement of building an index or tuning external parameters. The algorithm exploits quasi-periodicity present in many time series. For quasi-periodic time series, the algorithm gains significant speedup by reducing the number of calls to the distance function.
2005 Ieee Congress on Evolutionary Computation Vols 1 3 Proceedings, Oct 2, 2005
Choosing the best parameter setting is a wellknown important and challenging task in Evolutionary... more Choosing the best parameter setting is a wellknown important and challenging task in Evolutionary Algorithms (EAs). As one of the earliest parameter tuning techniques, the Meta-EA approach regards each parameter as a variable and the performance of algorithm as the fitness value and conducts searching on this landscape using various genetic operators. However, there are some inherent issues in this method. For example, some algorithm parameters are generally not searchable because it is difficult to define any sensible distance metric on them. In this paper, a novel approach is proposed by combining the Meta-EA approach with a method called Racing, which is based on the statistical analysis of algorithm performance with different parameter settings. A series of experiments are conducted to show the reliability and efficiency of this hybrid approach in tuning Genetic Algorithms (GAs) on two benchmark problems.
2007 Ieee Congress on Evolutionary Computation Vols 1 10 Proceedings, Sep 1, 2007
Algorithms and the Cross-Entropy method use probabilistic modelling and inference to generate can... more Algorithms and the Cross-Entropy method use probabilistic modelling and inference to generate candidate solutions in optimization problems. The model fitting task in this class of algorithms has largely been carried out to date based on maximum likelihood. An alternative approach that is prevalent in statistics and machine learning is to use Bayesian inference. In this paper, we provide a framework for the application of Bayesian inference techniques in probabilistic model-based optimization. Based on this framework, a simple continuous Bayesian Estimation of Distribution Algorithm is described. We evaluate and compare this algorithm experimentally with its maximum likelihood equivalent, UMDA G c .
Lecture Notes in Computer Science, 2004
In empirical studies of Evolutionary Algorithms, it is usually desirable to evaluate and compare ... more In empirical studies of Evolutionary Algorithms, it is usually desirable to evaluate and compare algorithms using as many different parameter settings and test problems as possible, in order to have a clear and detailed picture of their performance. Unfortunately, the total number of experiments required may be very large, which often makes such research work computationally prohibitive. In this paper, the application of a statistical method called racing is proposed as a general-purpose tool to reduce the computational requirements of large-scale experimental studies in evolutionary algorithms. Experimental results are presented that show that racing typically requires only a small fraction of the cost of an exhaustive experimental study.
2006 IEEE International Conference on Evolutionary Computation, 2006
This paper presents some initial attempts to mathematically model the dynamics of a continuous Es... more This paper presents some initial attempts to mathematically model the dynamics of a continuous Estimation of Distribution Algorithm (EDA) based on Gaussian distributions. Case studies are conducted on both unimodal and multimodal problems to highlight the effectiveness of the proposed technique and explore some fundamental issues of the EDA. With some general assumptions, we can show that, for one-dimensional unimodal problems and with the (µ, λ) scheme:
2005 IEEE Congress on Evolutionary Computation, 2005
A comprehensive set of experiments was conducted with a continuous EDA on 25 test problems provid... more A comprehensive set of experiments was conducted with a continuous EDA on 25 test problems provided in the real-parameter optimization special session. It is expected that the results presented here could be used to gain some deeper understanding of the performance of the EDA as well as facilitate the comparison across different algorithms.
In this paper, we address some issues related to evaluating and testing evolutionary algorithms. ... more In this paper, we address some issues related to evaluating and testing evolutionary algorithms. A landscape generator based on Gaussian functions is proposed for generating a variety of continuous landscapes as fitness functions. Through some initial experiments, we illustrate the usefulness of this landscape generator in testing evolutionary algorithms.
Proceedings of the 2005 conference on Genetic and evolutionary computation - GECCO '05, 2005
The development of Estimation of Distribution Algorithms (EDAs) has largely been driven by using ... more The development of Estimation of Distribution Algorithms (EDAs) has largely been driven by using more and more complex statistical models to approximate the structure of search space. However, there are still problems that are difficult for EDAs even with models capable of capturing high order dependences. In this paper, we show that diversity maintenance plays an important role in the performance of EDAs. A continuous EDA based on the Cholesky decomposition is tested on some well-known difficult benchmark problems to demonstrate how different diversity maintenance approaches could be applied to substantially improve its performance.
This chapter presents a novel framework for tuning the parameters of Evolutionary Algorithms. A h... more This chapter presents a novel framework for tuning the parameters of Evolutionary Algorithms. A hybrid technique combining Meta-EAs and statistical Racing approaches is developed, which is not only capable of effectively exploring the search space of numerical parameters but also suitable for tuning symbolic parameters where it is generally difficult to define any sensible distance metric.
Algorithms and the Cross-Entropy method use probabilistic modelling and inference to generate can... more Algorithms and the Cross-Entropy method use probabilistic modelling and inference to generate candidate solutions in optimization problems. The model fitting task in this class of algorithms has largely been carried out to date based on maximum likelihood. An alternative approach that is prevalent in statistics and machine learning is to use Bayesian inference. In this paper, we provide a framework for the application of Bayesian inference techniques in probabilistic model-based optimization. Based on this framework, a simple continuous Bayesian Estimation of Distribution Algorithm is described. We evaluate and compare this algorithm experimentally with its maximum likelihood equivalent, UMDA G c .
Diagnosis Related Group (DRG) upcoding is an anomaly in healthcare data that costs hundreds of mi... more Diagnosis Related Group (DRG) upcoding is an anomaly in healthcare data that costs hundreds of millions of dollars in many developed countries. DRG upcoding is typically detected through resource intensive auditing. As supervised modeling of DRG upcoding is severely constrained by scope and timeliness of past audit data, we propose in this paper an unsupervised algorithm to filter data for potential identification of DRG upcoding. The algorithm has been applied to a hip replacement/revision dataset and a heart-attack dataset. The results are consistent with the assumptions held by domain experts.
Time series discord has proven to be a useful concept for time-series anomaly identification. To ... more Time series discord has proven to be a useful concept for time-series anomaly identification. To search for discords, various algorithms have been developed. Most of these algorithms rely on prebuilding an index (such as a trie) for subsequences. Users of these algorithms are typically required to choose optimal values for word-length and/or alphabet-size parameters of the index, which are not intuitive. In this paper, we propose an algorithm to directly search for the top-K discords, without the requirement of building an index or tuning external parameters. The algorithm exploits quasi-periodicity present in many time series. For quasi-periodic time series, the algorithm gains significant speedup by reducing the number of calls to the distance function.
Journal of Computer Science and Technology, 2013
ABSTRACT Time-series discord is widely used in data mining applications to characterize anomalous... more ABSTRACT Time-series discord is widely used in data mining applications to characterize anomalous subsequences in time series. Compared to some other discord search algorithms, the direct search algorithm based on the recurrence plot shows the advantage of being fast and parameter free. The direct search algorithm, however, relies on quasi-periodicity in input time series, an assumption that limits the algorithm's applicability. In this paper, we eliminate the periodicity assumption from the direct search algorithm by proposing a reference function for subsequences and a new sampling strategy based on the reference function. These measures result in a new algorithm with improved efficiency and robustness, as evidenced by our empirical evaluation.
IEEE Transactions on Evolutionary Computation, 2000
The research literature on metaheuristic and evolutionary computation has proposed a large number... more The research literature on metaheuristic and evolutionary computation has proposed a large number of algorithms for the solution of challenging real-world optimization problems. It is often not possible to study theoretically the performance of these algorithms unless significant assumptions are made on either the algorithm itself or the problems to which it is applied, or both. As a consequence, metaheuristics are typically evaluated empirically using a set of test problems. Unfortunately, relatively little attention has been given to the development of methodologies and tools for the large-scale empirical evaluation and/or comparison of metaheuristics.
Evolutionary Computation, 2005
Evolutionary algorithms perform optimization using a population of sample solution points. An int... more Evolutionary algorithms perform optimization using a population of sample solution points. An interesting development has been to view population-based optimization as the process of evolving an explicit, probabilistic model of the search space. This paper investigates a formal basis for continuous, population-based optimization in terms of a stochastic gradient descent on the Kullback-Leibler divergence between the model probability density and the objective function, represented as an unknown density of assumed form. This leads to an update rule that is related and compared with previous theoretical work, a continuous version of the population-based incremental learning algorithm, and the generalized mean shift clustering framework. Experimental results are presented that demonstrate the dynamics of the new algorithm on a set of simple test problems.
A major challenge in the development of peptidebased vaccines is finding the right immunogenic el... more A major challenge in the development of peptidebased vaccines is finding the right immunogenic element, with efficient and long-lasting immunisation effects, from large potential targets encoded by pathogen genomes. Computer models are convenient tools for scanning pathogen genomes to pre-select candidate immunogenic peptides for experimental validation. Current methods predict many false positives resulting from a low prevalence of true positives.