Bayesian Optimization in Machine Learning (original) (raw)

Last Updated : 23 Jul, 2025

Bayesian Optimization is a powerful optimization technique that leverages the principles of Bayesian inference to find the minimum (or maximum) of an objective function efficiently. Unlike traditional optimization methods that require extensive evaluations, Bayesian Optimization is particularly effective when dealing with expensive, noisy, or black-box functions.

**This article delves into the core concepts, working mechanisms, advantages, and applications of Bayesian Optimization, providing a comprehensive understanding of why it has become a go-to tool for optimizing complex functions.

Table of Content

What is Bayesian Optimization?

Bayesian Optimization is a strategy for optimizing expensive-to-evaluate functions. It operates by building a probabilistic model of the objective function and using this model to select the most promising points to evaluate next. This approach is particularly useful in scenarios where the objective function is unknown, noisy, or costly to evaluate, as it aims to minimize the number of evaluations required to find the optimal solution.

**The optimization process involves two main components:

  1. **Surrogate Model: A probabilistic model (often a Gaussian Process) that approximates the objective function.
  2. **Acquisition Function: A utility function that guides the selection of the next point to evaluate based on the surrogate model.

How Does Bayesian Optimization Work?

Bayesian optimization effectively combines statistical modeling and decision-making strategies to optimize complex, costly functions. Here’s a more detailed explanation of the process, including key formulas:

1. **Initialization

The process begins by sampling the objective function f at a few initial points. These points can be selected randomly or through systematic methods such as Latin Hypercube Sampling, which helps ensure diverse and comprehensive coverage of the input space.

2. **Building the Surrogate Model

A Gaussian Process (GP) is typically used as the surrogate model. The GP is favored for its ability to provide both a mean prediction and a measure of uncertainty (variance) at any point in the input space. The GP is defined by a mean function m(x) and a covariance function k(x, x'), and it models the function as:

f(x) \sim \mathcal{GP}(m(x), k(x, x'))

Where:

k(x, x') = \exp\left(-\frac{1}{2l^2} \| x - x' \|^2\right)

3. **Acquisition Function Maximization

The next sampling point is chosen by maximizing an acquisition function that trades off between exploration and exploitation. Common acquisition functions include:

EI(x) = \mathbb{E}\left[\max(f(x) - f(x^+), 0)\right]

Where f(x^+) is the current best observed value of f. EI measures the expected increase in the objective function relative to the best current observation.

UCB(x) = \mu(x) + \kappa \sigma(x)

Where \mu(x) and \sigma(x) are the mean and standard deviation of the GP’s predictions at point x, and \kappa is a parameter that balances exploration and exploitation.

4. **Evaluating the Objective Function

The point x selected by maximizing the acquisition function is then evaluated to obtain f(x). This new data point is added to the dataset, which is used to update the GP model.

5. **Iteration

The steps of updating the acquisition function, selecting new points, and updating the surrogate model are repeated. With each iteration, the surrogate model becomes increasingly accurate, and the search progressively hones in on the optimum.

6. **Termination

The optimization process continues until a predefined stopping criterion is met, such as reaching a maximum number of function evaluations or achieving a convergence threshold where the improvements become minimal.

This structured approach allows Bayesian optimization to efficiently navigate complex landscapes, minimizing the number of evaluations needed to locate the optimum by intelligently balancing exploration of unknown regions and exploitation of promising areas.

Key Concepts in Bayesian Optimization

  1. **Gaussian Process (GP): A Gaussian Process is a non-parametric model that defines a distribution over functions. In Bayesian Optimization, GPs are often used as the surrogate model because they provide not only an estimate of the objective function but also a measure of uncertainty.
  2. **Acquisition Functions:
    • **Expected Improvement (EI): A popular acquisition function that selects points where the expected improvement over the current best solution is maximized.
    • **Probability of Improvement (PI): Chooses points with the highest probability of improving the current best solution.
    • **Upper Confidence Bound (UCB): Balances exploration and exploitation by selecting points based on a confidence interval around the GP prediction.
  3. **Exploration vs. Exploitation: Exploration involves searching in areas of the search space with high uncertainty, while exploitation focuses on areas where the surrogate model predicts good outcomes. The acquisition function manages this trade-off to efficiently find the optimum.

Advantages of Bayesian Optimization

Applications of Bayesian Optimization

Limitations of Bayesian Optimization

Implementing Bayesian Optimization in Python

In this section, we are going to implement Bayesian Optimization using the 'scikit-optimize' library in python.

You can install scikit-optimize using pip if you haven't already:

pip install scikit-optimize

import numpy as np from skopt import gp_minimize from skopt.space import Real, Integer from skopt.plots import plot_convergence

Define the objective function to minimize

def objective_function(x): return (x[0] - 2) ** 2 + (x[1] - 3) ** 2

Define the search space

space = [Real(0.0, 5.0, name='x1'), # Continuous space for x1 Real(0.0, 5.0, name='x2')] # Continuous space for x2

Perform Bayesian Optimization

result = gp_minimize(objective_function, # The function to minimize space, # The search space n_calls=20, # The number of evaluations random_state=42) # Random state for reproducibility

Print the best parameters and the corresponding minimum value

print("Best parameters: x1 = {:.4f}, x2 = {:.4f}".format(result.x[0], result.x[1])) print("Minimum value: {:.4f}".format(result.fun))

Plot convergence

plot_convergence(result)

`

**Output:

Best parameters: x1 = 2.0003, x2 = 3.0003
Minimum value: 0.0000

downloa

_The plot and the output together indicate that the Bayesian Optimization process was successful in finding the minimum of the objective function, and it converged efficiently after about 12 evaluations. The final solution is very close to the true minimum of the function, as indicated by the near-zero minimum value.

Conclusion

Bayesian Optimization stands out as a powerful and efficient approach to optimizing complex functions, particularly when evaluations are expensive, noisy, or time-consuming. Its ability to balance exploration and exploitation through a probabilistic surrogate model makes it a versatile tool across various domains, from machine learning to scientific research. By understanding and implementing Bayesian Optimization, practitioners can achieve optimal solutions with minimal evaluations, saving both time and resources in the process.