Bagging Classifier (original) (raw)

Last Updated : 2 May, 2026

Bagging or Bootstrap Aggregating, works by training multiple base models independently and in parallel on different random subsets of the training data. These subsets are created using bootstrap sampling, where data points are randomly selected with replacement, allowing some samples to appear multiple times while others may be excluded.

bagging_classifier

Bagging Classifier

Starting with an original dataset containing multiple data points (represented by colored circles). The original dataset is randomly sampled with replacement multiple times. This means that in each sample, a data point can be selected more than once or not at all. These samples create multiple subsets of the original data.

Bagging helps improve accuracy and reduce overfitting especially in models that have high variance.

Working of Bagging Classifier

Let's break it down step by step:
Original training dataset: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Resampled training set 1: [2, 3, 3, 5, 6, 1, 8, 10, 9, 1]
Resampled training set 2: [1, 1, 5, 6, 3, 8, 9, 10, 2, 7]
Resampled training set 3: [1, 5, 8, 9, 2, 10, 9, 7, 5, 4]

Bagging starts with the original training dataset.

Implementation

Let's see the implementation of Bagging Classifier,

Step 1: Import Libraries

We will import the necessary libraries such as numpy and sklearn for our model,

Python `

import numpy as np from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_digits from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

`

Step 2: Define BaggingClassifier Class and Initialize

import numpy as np

class BaggingClassifier: def init(self, base_classifier, n_estimators): self.base_classifier = base_classifier self.n_estimators = n_estimators self.classifiers = []

`

Step 3: Implement the fit Method to Train Classifiers

For each estimator:

def fit(self, X, y): for _ in range(self.n_estimators): indices = np.random.choice(len(X), len(X), replace=True) X_sampled, y_sampled = X[indices], y[indices] clf = self.base_classifier.class() clf.fit(X_sampled, y_sampled) self.classifiers.append(clf) return self.classifiers

`

Step 4: Implement the predict Method Using Majority Voting

def predict(self, X): predictions = np.array([clf.predict(X) for clf in self.classifiers]) majority_votes = np.apply_along_axis( lambda x: np.bincount(x).argmax(), axis=0, arr=predictions) return majority_votes

`

Step 5: Load Data

We will,

digits = load_digits() X, y = digits.data, digits.target

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)

`

Step 6: Train Bagging Classifier and Evaluate Accuracy

base_clf = DecisionTreeClassifier() model = BaggingClassifier(base_classifier=base_clf, n_estimators=10) model.fit(X_train, y_train)

y_pred = model.predict(X_test) print("Accuracy:", accuracy_score(y_test, y_pred))

`

**Output:

Accuracy: 0.9166666666666666

Step 7: Evaluate Each Classifier's Individual Performance

for i, clf in enumerate(model.classifiers): y_pred_i = clf.predict(X_test) acc_i = accuracy_score(y_test, y_pred_i) print(f"Accuracy of classifier {i+1}: {acc_i:.4f}")

`

**Output:

Accuracy of classifier 1: 0.8222
Accuracy of classifier 2: 0.8417
Accuracy of classifier 3: 0.8306
Accuracy of classifier 4: 0.8444
Accuracy of classifier 5: 0.8583
Accuracy of classifier 6: 0.8194
Accuracy of classifier 7: 0.8333
Accuracy of classifier 8: 0.8389
Accuracy of classifier 9: 0.8361
Accuracy of classifier 10: 0.8278

Applications

Advantages

Disadvantages