MiniBatch Gradient Descent with Python (original) (raw)

Mini-Batch Gradient Descent with Python

Last Updated : 12 May, 2026

Gradient Descent is an optimization algorithm used to find optimal model parameters by minimizing the loss through iterative updates in the direction of the steepest descent.

understanding_batch_size_in_neural_network

Convergence in BGD, SGD and MBGD

**Working of Mini-Batch Gradient Descent

Mini-batch gradient descent updates model parameters using small data subsets, balancing the speed of SGD and the stability of batch gradient descent for efficient and stable training.

**Algorithm:

Let:

For itr=1,2,3,…,max_iters:

For each mini-batch (X_{mini}, y_{mini}):

1. Forward Pass on the batch X_mini**:**

Make predictions on the mini-batch

\hat{y} = f(X_{\text{mini}},\ \theta)

Compute error in predictions J(θ) with the current values of the parameters

J(θ)=L(\hat{y},y_{mini})

**2. Backward Pass:

Compute gradient:

\nabla_{\theta} J(\theta) = \frac{\partial J(\theta)}{\partial \theta}

**3. Update parameters:

Gradient descent rule:

\theta = \theta - \eta \nabla_{\theta} J(\theta)

**Python Implementation

Here we will use Mini-Batch Gradient Descent for Linear Regression.

1. Importing Libraries

We begin by importing libraries like NumpyandMatplotlib.pyplot

Python `

import numpy as np import matplotlib.pyplot as plt

`

2. Generating Synthetic 2D Data

Here, we generate 8000 two-dimensional data points sampled from a multivariate normal distribution:

mean = np.array([5.0, 6.0]) cov = np.array([[1.0, 0.95], [0.95, 1.2]]) data = np.random.multivariate_normal(mean, cov, 8000)

`

3. Visualizing Generated Data

Python `

plt.scatter(data[:500, 0], data[:500, 1], marker='.') plt.title("Scatter Plot of First 500 Samples") plt.xlabel("Feature 1") plt.ylabel("Feature 2") plt.grid(True) plt.show()

`

**Output:

4. Splitting Data

We split the data into training and testing sets:

data = np.hstack((np.ones((data.shape[0], 1)), data))

split_factor = 0.90 split = int(split_factor * data.shape[0])

X_train = data[:split, :-1] y_train = data[:split, -1].reshape((-1, 1)) X_test = data[split:, :-1] y_test = data[split:, -1].reshape((-1, 1))

`

5. Displaying Datasets

Python `

print("Number of examples in training set = %d" % X_train.shape[0]) print("Number of examples in testing set = %d" % X_test.shape[0])

`

**Output:

Data-split_Output

results

6. Defining Core Functions of Linear Regression

def hypothesis(X, theta): return np.dot(X, theta)

def gradient(X, y, theta): h = hypothesis(X, theta) grad = np.dot(X.T, (h - y)) return grad

def cost(X, y, theta): h = hypothesis(X, theta) J = np.dot((h - y).T, (h - y)) / 2 return J[0]

`

7. Creating Mini-Batches for Training

This function divides the dataset into **random mini-batches used during training:

def create_mini_batches(X, y, batch_size): mini_batches = [] data = np.hstack((X, y)) np.random.shuffle(data) n_minibatches = data.shape[0] // batch_size for i in range(n_minibatches + 1): mini_batch = data[i * batch_size:(i + 1) * batch_size, :] X_mini = mini_batch[:, :-1] Y_mini = mini_batch[:, -1].reshape((-1, 1)) mini_batches.append((X_mini, Y_mini)) if data.shape[0] % batch_size != 0: mini_batch = data[i * batch_size:] X_mini = mini_batch[:, :-1] Y_mini = mini_batch[:, -1].reshape((-1, 1)) mini_batches.append((X_mini, Y_mini)) return mini_batches

`

8. Mini-Batch Gradient Descent Function

This function performs mini-batch gradient descent to train the linear regression model:

def gradientDescent(X, y, learning_rate=0.001, batch_size=32): theta = np.zeros((X.shape[1], 1)) error_list = [] max_iters = 3

for itr in range(max_iters):
    mini_batches = create_mini_batches(X, y, batch_size)
    for X_mini, y_mini in mini_batches:
        theta = theta - learning_rate * gradient(X_mini, y_mini, theta)
        error_list.append(cost(X_mini, y_mini, theta))

return theta, error_list

`

9. Training and Visualization

The model is trained using gradientDescent() on the training data. After training:

This provides a visual and quantitative insight into how well the mini-batch gradient descent is optimizing the regression model.

Python `

theta, error_list = gradientDescent(X_train, y_train) print("Bias = ", theta[0]) print("Coefficients = ", theta[1:])

plt.plot(error_list) plt.xlabel("Number of iterations") plt.ylabel("Cost") plt.show()

`

**Output:

mini-batch-Output2

Mini-Batch over Regression model

10. Final Prediction and Evaluation

**Prediction: The hypothesis() function is used to compute predicted values for the test set.

**Visualization:

**Evaluation:

y_pred = hypothesis(X_test, theta)

plt.scatter(X_test[:, 1], y_test, marker='.', label='Actual') plt.plot(X_test[:, 1], y_pred, color='orange', label='Predicted') plt.xlabel("Feature 1") plt.ylabel("Target") plt.title("Model Predictions vs Actual Values") plt.legend() plt.grid(True) plt.show()

error = np.sum(np.abs(y_test - y_pred)) / y_test.shape[0] print("Mean Absolute Error =", error)

`

**Output:

Mini-batch-Output-3

Model prediction and Actual values

Batch vs Stochastic vs Mini-Batch Gradient Descent.

**Type **Update Strategy **Speed and Efficiency **Noise in Updates **Memory Usage
**Batch Gradient Descent Uses entire dataset for each update Slow, high computation cost Smooth and stable High (needs full dataset in memory)
**Stochastic Gradient Descent (SGD) Updates using one sample at a time Fast updates but less efficient overall Highly noisy updates Low
**Mini-Batch Gradient Descent Uses small batches of datatraining examples Fast and efficient (supports vectorization) Moderate noise—dependent on batch size Moderate