Lifelong Bayesian Optimization (original) (raw)

Bayesian Optimization for Selecting Efficient Machine Learning Models

ArXiv, 2020

The performance of many machine learning models depends on their hyper-parameter settings. Bayesian Optimization has become a successful tool for hyper-parameter optimization of machine learning algorithms, which aims to identify optimal hyper-parameters during an iterative sequential process. However, most of the Bayesian Optimization algorithms are designed to select models for effectiveness only and ignore the important issue of model training efficiency. Given that both model effectiveness and training time are important for real-world applications, models selected for effectiveness may not meet the strict training time requirements necessary to deploy in a production environment. In this work, we present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency. We propose an objective that captures the tradeoff between these two metrics and demonstrate how we can jointly optimize them in a principled Bayes...

Hybrid Batch Bayesian Optimization

2012

Bayesian Optimization (BO) aims at optimizing an unknown function that is costly to evaluate. We focus on applications where concurrent function evaluations are possible. In such cases, BO could choose to either sequentially evaluate the function (sequential mode) or evaluate the function with multiple inputs at once (batch mode). The sequential mode generally leads to better optimization performance as each function evaluation is selected with more information, whereas the batch mode is more time efficient (smaller number of iterations). Our goal is to combine the strength of both settings. We systematically analyze BO using a Gaussian Process as the posterior estimator and provide a hybrid algorithm that dynamically switches between sequential and batch with variable batch sizes. We theoretically justify our algorithm and present experimental results on eight benchmark BO problems. The results show that our method achieves substantial speedup (up to 78%) compared to sequential, without suffering any significant performance loss.

HEBO: An Empirical Study of Assumptions in Bayesian Optimisation

Journal of Artificial Intelligence Research

In this work we rigorously analyse assumptions inherent to black-box optimisation hyper-parameter tuning tasks. Our results on the Bayesmark benchmark indicate that heteroscedasticity and non-stationarity pose significant challenges for black-box optimisers. Based on these findings, we propose a Heteroscedastic and Evolutionary Bayesian Optimisation solver (HEBO). HEBO performs non-linear input and output warping, admits exact marginal log-likelihood optimisation and is robust to the values of learned parameters. We demonstrate HEBO’s empirical efficacy on the NeurIPS 2020 Black-Box Optimisation challenge, where HEBO placed first. Upon further analysis, we observe that HEBO significantly outperforms existing black-box optimisers on 108 machine learning hyperparameter tuning tasks comprising the Bayesmark benchmark. Our findings indicate that the majority of hyper-parameter tuning tasks exhibit heteroscedasticity and non-stationarity, multiobjective acquisition ensembles with Pareto ...

Practical Bayesian Optimization of Machine Learning Algorithms

The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a " black art " requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization , in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparame-ters, can play a crucial role in obtaining a good optimizer that can achieve expert-level performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

An Empirical Study of Assumptions in Bayesian Optimisation

2020

AutoML with Bayesian Optimizations for Big Data Management

Information

The field of automated machine learning (AutoML) has gained significant attention in recent years due to its ability to automate the process of building and optimizing machine learning models. However, the increasing amount of big data being generated has presented new challenges for AutoML systems in terms of big data management. In this paper, we introduce Fabolas and learning curve extrapolation as two methods for accelerating hyperparameter optimization. Four methods for quickening training were presented including Bag of Little Bootstraps, k-means clustering for Support Vector Machines, subsample size selection for gradient descent, and subsampling for logistic regression. Additionally, we also discuss the use of Markov Chain Monte Carlo (MCMC) methods and other stochastic optimization techniques to improve the efficiency of AutoML systems in managing big data. These methods enhance various facets of the training process, making it feasible to combine them in diverse ways to ga...

Adaptive Bayesian optimization for dynamic problems

2018

This thesis studies the problem of tracking the extremum of an objective function that is latent, noisy and expensive to evaluate. This problem is notable because many large-scale learning systems with complex models operating on non-stationary data have meta-problems whose solutions require the tracking of an evolving extremum. We start by describing dynamic optimization problems and model them using spatiotemporal Gaussian process priors. We construct an intelligent search mechanism that uses the learnt insights to skillfully guide the search by dynamically modifying the feasible search region as a device to keep track of the evolution. We also show that this mechanism induces a natural approximation scheme for cases where the number of samples for the model becomes too expensive for inference. We test the resulting method on synthetic and real-world problems. In the next part of the thesis, we demonstrate the utility of the method on pertinent real-world meta-problems occurring i...

Tutorial on Bayesian Optimization

Preprints 2023, 2023030292, 2023

Machine learning forks into three main branches such as supervised learning, unsupervised learning, and reinforcement learning where reinforcement learning is much potential to artificial intelligence (AI) applications because it solves real problems by progressive process in which possible solutions are improved and finetuned continuously. The progressive approach, which reflects ability of adaptation, is appropriate to the real world where most events occur and change continuously and unexpectedly. Moreover, data is getting too huge for supervised learning and unsupervised learning to draw valuable knowledge from such huge data at one time. Bayesian optimization (BO) models an optimization problem as a probabilistic form called surrogate model and then directly maximizes an acquisition function created from such surrogate model in order to maximize implicitly and indirectly the target function for finding out solution of the optimization problem. A popular surrogate model is Gaussian process regression model. The process of maximizing acquisition function is based on updating posterior probability of surrogate model repeatedly, which is improved after every iteration. Taking advantages of acquisition function or utility function is also common in decision theory but the semantic meaning behind BO is that BO solves problems by progressive and adaptive approach via updating surrogate model from a small piece of data at each time, according to ideology of reinforcement learning. Undoubtedly, BO is a reinforcement learning algorithm with many potential applications and thus it is surveyed in this research with attention to its mathematical ideas. Moreover, the solution of optimization problem is important to not only applied mathematics but also AI.

Amortized Bayesian Optimization over Discrete Spaces

2020

Bayesian optimization is a principled approach for globally optimizing expensive, black-box functions by using a surrogate model of the objective. However, each step of Bayesian optimization involves solving an inner optimization problem, in which we maximize an acquisition function derived from the surrogate model to decide where to query next. This inner problem can be challenging to solve, particularly in discrete spaces, such as protein sequences or molecular graphs, where gradient-based optimization cannot be used. Our key insight is that we can train a parameterized policy to generate candidates that maximize the acquisition function. This is faster than standard parameterfree search methods, since we can amortize the cost of learning the policy across rounds of Bayesian optimization. We therefore call this Amortized Bayesian Optimization. On several challenging discrete design problems, we show this method generally outperforms other methods at optimizing the inner acquisitio...

High-Dimensional Bayesian Optimization via Tree-Structured Additive Models

Proceedings of the ... AAAI Conference on Artificial Intelligence, 2020

Bayesian Optimization (BO) has shown significant success in tackling expensive low-dimensional black-box optimization problems. Many optimization problems of interest are high-dimensional, and scaling BO to such settings remains an important challenge. In this paper, we consider generalized additive models in which low-dimensional functions with overlapping subsets of variables are composed to model a high-dimensional target function. Our goal is to lower the computational resources required and facilitate faster model learning by reducing the model complexity while retaining the sample-efficiency of existing methods. Specifically, we constrain the underlying dependency graphs to tree structures in order to facilitate both the structure learning and optimization of the acquisition function. For the former, we propose a hybrid graph learning algorithm based on Gibbs sampling and mutation. In addition, we propose a novel zooming-based algorithm that permits generalized additive models to be employed more efficiently in the case of continuous domains. We demonstrate and discuss the efficacy of our approach via a range of experiments on synthetic functions and real-world datasets.

Lifelong Bayesian Optimization (original) (raw)

Related papers