Gaussian Processes in Machine Learning (original) (raw)

Last Updated : 15 Jun, 2026

Gaussian Processes (GPs) are probabilistic machine learning models used for regression and classification tasks. Instead of predicting a single value, they provide both predictions and a measure of uncertainty, making them useful for problems where confidence in predictions is important.

Non parametric learning approach.
Provides predictions with uncertainty estimates.
Effective for modeling complex and nonlinear relationships.
Works well with small and medium sized datasets.

Key Concepts of Gaussian Processes

Gaussian Processes (GPs) are defined using a mean function m(x) and a covariance function (kernel) k(x,x′)

f(x)\sim GP(m(x),k(x,x′))

Where:

m(x) : Mean function
k(x,x′) : Covariance function (Kernel)

**1. Kernels

Kernels, also called covariance functions or similarity functions, measure the similarity between input points and help Gaussian Processes learn patterns from data.

Capture relationships between data points.
Model both linear and non-linear patterns.
Help in making predictions for new data.
Incorporate prior assumptions about the data.

**Common Kernels:

**Linear Kernel: Captures linear relationships.
**RBF (Gaussian) Kernel: Measures similarity based on distance and is widely used for smooth functions.

k_{RBF}(x,x') = exp(-\frac{||x-x'||}{2 l^2})

2. Prior Distribution

The prior distribution represents the initial assumptions about a function before any data is observed. It serves as the starting point of a Gaussian Process and is defined using the mean function and kernel (covariance function).

Represents beliefs before observing data.
Usually follows a Gaussian (normal) distribution.
Defined by the mean and covariance functions.
Acts as the foundation for learning from data.
Updates to the posterior distribution after observing data.

**Formula:

f(x) \sim N(m(x),k(x,x'))

**where:

m(x) : Mean function
k(x,x') : Covariance (kernel) function

3.Posterior Distribution

The posterior distribution, represents the updated belief about a function after observing data. It is obtained by combining the prior distribution with the observed data using Bayes' theorem.

Updates beliefs after observing data.
Combines prior knowledge with observed evidence.
Provides predictions along with uncertainty estimates.
Becomes more accurate as more data is available.
Remains Gaussian in Gaussian Processes.

**Formula:

p(f_*|X, y, X_*) = N(\mu_*, \Sigma_*)

**Where:

X : Training input data
y : Training output (target) data
X_* : Test or new input data
\mu_* : Posterior mean
\Sigma_* : Posterior covariance

4.Combining Kernels

Combining kernels allows Gaussian Processes to capture multiple patterns and relationships in the data. By adding or multiplying different kernels, the model becomes more flexible and can represent complex data structures more effectively.

Improves the flexibility and expressiveness of the model.
Captures different patterns using multiple kernels.
Helps model complex relationships in data.
Kernels can be combined through addition or multiplication.

**Kernel Addition:

k_{combined}(x,x') = k_1(x,x') + k_2(x,x')

5. Gaussian Process in Classification and Regression

Gaussian Processes can be applied to both regression and classification problems. In regression, they predict continuous values, while in classification, they predict discrete class labels.

**Regression:

Predicts continuous outcomes.
Provides a predictive distribution for new inputs.

**Classification:

Predicts discrete class labels.
Uses a non-linear function (e.g., logistic/sigmoid) to convert outputs into class probabilities.
Often requires approximation methods due to non-Gaussian likelihoods.

Implementation of Gaussian Processes

Step 1: Import Required Libraries

**fetch_california_housing : Loads the California Housing dataset.
**GaussianProcessRegressor : Implements Gaussian Process Regression.
**RBF and ConstantKernel : Used to define the kernel function.
**train_test_split : Splits the dataset into training and testing sets.
**mean_squared_error : Evaluates model performance. Python `

from sklearn.datasets import fetch_california_housing from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error

Step 2: Load the Dataset

X contains the input features such as income, house age and population.
y contains the target variable (house prices). Python `

data = fetch_california_housing() X, y = data.data, data.target

Step 3: Select a Subset of Data

Only the first 2000 samples are selected.
This makes training faster and prevents memory issues. Python `

subset_size = 2000 X_subset = X[:subset_size] y_subset = y[:subset_size]

Step 4: Split Data into Training and Testing Sets

Python `

X_train, X_test, y_train, y_test = train_test_split( X_subset, y_subset, test_size=0.3, random_state=42)

Step 5: Define the Kernel Function

**Constant Kernel (C): Controls the overall variance of the model.
**RBF Kernel: Captures smooth and non-linear relationships in the data. Python `

kernel = C(1.0, (1e-3, 1e3)) * RBF(length_scale=1.0)

Step 6: Create the Gaussian Process Regressor

**kernel=kernel : Uses the defined kernel function.
**n_restarts_optimizer=10 : Optimizes kernel parameters 10 times to find a better solution.
**random_state=42 : Ensures reproducible results. Python `

gp = GaussianProcessRegressor( kernel=kernel, n_restarts_optimizer=10, random_state=42)

Step 7: Train the Model

Kernel parameters are optimized.
Relationships between input features and house prices are learned. Python `

gp.fit(X_train, y_train)

Step 8: Make Predictions

The trained model predicts house prices for the test dataset.
The predicted values are stored in y_pred. Python `

y_pred = gp.predict(X_test)

Step 9: Evaluate Model Performance

Python `

mse = mean_squared_error(y_test, y_pred) print(f'Mean Squared Error: {mse}')

**Output:

Mean Squared Error: 1.5693

Download full code from here