Scalable Hyperparameter Optimization with Lazy Gaussian Processes (original) (raw)

Practical Bayesian Optimization of Machine Learning Algorithms

The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a " black art " requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization , in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparame-ters, can play a crucial role in obtaining a good optimizer that can achieve expert-level performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

Multi-Task Gaussian Process Upper Confidence Bound for Hyperparameter Tuning

In many scientific and engineering applications, Bayesian optimization (BO) is a powerful tool for hyperparameter tuning of a machine learning model, materials design and discovery, etc. BO guides the choice of experiments in a sequential way to find a good combination of design points in as few experiments as possible. It can be formulated as a problem of optimizing a “black-box” function. Different from single-task Bayesian optimization, Multi-task Bayesian optimization is a general method to efficiently optimize multiple different but correlated “black-box” functions. The previous works in Multi-task Bayesian optimization algorithm queries a point to be evaluated for all tasks in each round of search, which is not efficient. For the case where different tasks are correlated, it is not necessary to evaluate all tasks for a given query point. Therefore, the objective of this work is to develop an algorithm for multi-task Bayesian optimization with automatic task selection so that o...

Bayesian Optimization for Selecting Efficient Machine Learning Models

ArXiv, 2020

The performance of many machine learning models depends on their hyper-parameter settings. Bayesian Optimization has become a successful tool for hyper-parameter optimization of machine learning algorithms, which aims to identify optimal hyper-parameters during an iterative sequential process. However, most of the Bayesian Optimization algorithms are designed to select models for effectiveness only and ignore the important issue of model training efficiency. Given that both model effectiveness and training time are important for real-world applications, models selected for effectiveness may not meet the strict training time requirements necessary to deploy in a production environment. In this work, we present a unified Bayesian Optimization framework for jointly optimizing models for both prediction effectiveness and training efficiency. We propose an objective that captures the tradeoff between these two metrics and demonstrate how we can jointly optimize them in a principled Bayes...

Efficient Marginal Likelihood Computation for Gaussian Process Regression

2011

In a Bayesian learning setting, the posterior distribution of a predictive model arises from a trade-off between its prior distribution and the conditional likelihood of observed data. Such distribution functions usually rely on additional hyperparameters which need to be tuned in order to achieve optimum predictive performance; this operation can be efficiently performed in an Empirical Bayes fashion by maximizing the posterior marginal likelihood of the observed data. Since the score function of this optimization problem is in general characterized by the presence of local optima, it is necessary to resort to global optimization strategies, which require a large number of function evaluations. Given that the evaluation is usually computationally intensive and badly scaled with respect to the dataset size, the maximum number of observations that can be treated simultaneously is quite limited. In this paper, we consider the case of hyperparameter tuning in Gaussian process regression. A straightforward implementation of the posterior log-likelihood for this model requires O(N^3) operations for every iteration of the optimization procedure, where N is the number of examples in the input dataset. We derive a novel set of identities that allow, after an initial overhead of O(N^3), the evaluation of the score function, as well as the Jacobian and Hessian matrices, in O(N) operations. We prove how the proposed identities, that follow from the eigendecomposition of the kernel matrix, yield a reduction of several orders of magnitude in the computation time for the hyperparameter optimization problem. Notably, the proposed solution provides computational advantages even with respect to state of the art approximations that rely on sparse kernel matrices.

Automatic tuning of hyperparameters using Bayesian optimization

Evolving Systems, 2020

Deep learning is a field in artificial intelligence that works well in computer vision, natural language processing and audio recognition. Deep neural network architectures has number of layers to conceive the features well, by itself. The hyperparameter tuning plays a major role in every dataset which has major effect in the performance of the training model. Due to the large dimensionality of data it is impossible to tune the parameters by human expertise. In this paper, we have used the CIFAR-10 Dataset and applied the Bayesian hyperparameter optimization algorithm to enhance the performance of the model. Bayesian optimization can be used for any noisy black box function for hyperparameter tuning. In this work Bayesian optimization clearly obtains optimized values for all hyperparameters which saves time and improves performance. The results also show that the error has been reduced in graphical processing unit than in CPU by 6.2% in the validation. Achieving global optimization in the trained model helps transfer learning across domains as well.

Hyper-Parameter Initialization for Squared Exponential Kernel-based Gaussian Process Regression

Hyper-parameter optimization is an essential task in the use of machine learning techniques. Such optimizations are typically done starting with an initial guess provided to hyperparameter values followed by optimization (or minimization) of some cost function via gradient-based methods. The initial values become crucial since there is every chance for reaching local minimums in the cost functions being minimized, especially since gradient-based optimizing is done. Therefore, initializing hyper-parameters several times and repeating optimization to achieve the best solutions is usually attempted. Repetition of optimization can be computationally expensive when using techniques like Gaussian Process (GP) which has an O(n3) complexity, and not having a formal strategy to initialize hyperparameter values is an additional challenge. In general, reinitialization of hyper-parameter values in the contexts of many machine learning techniques including GP has been done at random over the yea...

ACCELERATED BAYESIAN OPTIMIZATION FOR DEEP LEARNING

Bayesian optimization for deep learning has extensive execution time because it involves several calculations and parameters. To solve this problem, this study aims at accelerating the execution time by focusing on the output of the activation function that is strongly related to accuracy. We developed a technique to accelerate the execution time by stopping the learning model so that the activation function of the first and second layers would become zero. Two experiments were conducted to confirm the effectiveness of the proposed method. First, we implemented the proposed technique and compared its execution time with that of Bayesian optimization. We successfully accelerated the execution time of Bayesian optimization for deep learning. Second, we attempted to apply the proposed method for credit card transaction data. From these experiments, it was confirmed that the purpose of our study was achieved. In particular, we concluded that the proposed method can accelerate the execution time when deep learning is applied to an extremely large amount of data.

Using Gaussian Processes to Optimize Expensive Functions

Lecture Notes in Computer Science, 2008

The task of finding the optimum of some function f (x) is commonly accomplished by generating and testing sample solutions iteratively, choosing each new sample x heuristically on the basis of results to date. We use Gaussian processes to represent predictions and uncertainty about the true function, and describe how to use these predictions to choose where to take each new sample in an optimal way. By doing this we were able to solve a difficult optimization problem -finding weights in a neural network controller to simultaneously balance two vertical poles -using an order of magnitude fewer samples than reported elsewhere.

Improved Gaussian Process Acquisition for Targeted Bayesian Optimization

International Journal of Modeling and Optimization

A black-box optimization problem is considered, in which the function to be optimized can only be expressed in terms of a complicated stochastic algorithm that takes a long time to evaluate. The value returned is required to be sufficiently near to a target value, and uses data that has a significant noise component. Bayesian Optimization with an underlying Gaussian Process is used as an optimization solution, and its effectiveness is measured in terms of the number of function evaluations required to attain the target. To improve results, a simple modification of the Gaussian Process ‘Lower Confidence Bound’ (LCB) acquisition function is proposed. The expression used for the confidence bound is squared in order to better comply with the target requirement. With this modification, much improved results compared to random selection methods and to other commonly used acquisition functions are obtained.

Efficient Optimization for Sparse Gaussian Process Regression

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015

We propose an efficient optimization algorithm for selecting a subset of training data to induce sparsity for Gaussian process regression. The algorithm estimates an inducing set and the hyperparameters using a single objective, either the marginal likelihood or a variational free energy. The space and time complexity are linear in training set size, and the algorithm can be applied to large regression problems on discrete or continuous domains. Empirical evaluation shows state-ofart performance in discrete cases and competitive results in the continuous case.