Gaussian Processes in Machine Learning (original) (raw)

Last Updated : 15 Jun, 2026

Gaussian Processes (GPs) are probabilistic machine learning models used for regression and classification tasks. Instead of predicting a single value, they provide both predictions and a measure of uncertainty, making them useful for problems where confidence in predictions is important.

Key Concepts of Gaussian Processes

Gaussian Processes (GPs) are defined using a mean function m(x) and a covariance function (kernel) k(x,x′)

f(x)\sim GP(m(x),k(x,x′))

Where:

**1. Kernels

Kernels, also called covariance functions or similarity functions, measure the similarity between input points and help Gaussian Processes learn patterns from data.

**Common Kernels:

k_{RBF}(x,x') = exp(-\frac{||x-x'||}{2 l^2})

2. Prior Distribution

The prior distribution represents the initial assumptions about a function before any data is observed. It serves as the starting point of a Gaussian Process and is defined using the mean function and kernel (covariance function).

**Formula:

f(x) \sim N(m(x),k(x,x'))

**where:

3.Posterior Distribution

The posterior distribution, represents the updated belief about a function after observing data. It is obtained by combining the prior distribution with the observed data using Bayes' theorem.

**Formula:

p(f_*|X, y, X_*) = N(\mu_*, \Sigma_*)

**Where:

4.Combining Kernels

Combining kernels allows Gaussian Processes to capture multiple patterns and relationships in the data. By adding or multiplying different kernels, the model becomes more flexible and can represent complex data structures more effectively.

**Kernel Addition:

k_{combined}(x,x') = k_1(x,x') + k_2(x,x')

5. Gaussian Process in Classification and Regression

Gaussian Processes can be applied to both regression and classification problems. In regression, they predict continuous values, while in classification, they predict discrete class labels.

**Regression:

**Classification:

Implementation of Gaussian Processes

Step 1: Import Required Libraries

from sklearn.datasets import fetch_california_housing from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error

`

Step 2: Load the Dataset

data = fetch_california_housing() X, y = data.data, data.target

`

Step 3: Select a Subset of Data

subset_size = 2000 X_subset = X[:subset_size] y_subset = y[:subset_size]

`

Step 4: Split Data into Training and Testing Sets

Python `

X_train, X_test, y_train, y_test = train_test_split( X_subset, y_subset, test_size=0.3, random_state=42)

`

Step 5: Define the Kernel Function

kernel = C(1.0, (1e-3, 1e3)) * RBF(length_scale=1.0)

`

Step 6: Create the Gaussian Process Regressor

gp = GaussianProcessRegressor( kernel=kernel, n_restarts_optimizer=10, random_state=42)

`

Step 7: Train the Model

gp.fit(X_train, y_train)

`

Step 8: Make Predictions

y_pred = gp.predict(X_test)

`

Step 9: Evaluate Model Performance

Python `

mse = mean_squared_error(y_test, y_pred) print(f'Mean Squared Error: {mse}')

`

**Output:

Mean Squared Error: 1.5693

Download full code from here