Ensemble Learning (original) (raw)

Last Updated : 2 May, 2026

Ensemble learning is a method where multiple models are combined instead of using just one. Even if individual models are weak, combining their results gives more accurate and reliable predictions.

Types of Ensemble Learning

There are three main types of ensemble methods:

  1. **Bagging (Bootstrap Aggregating): Models are trained independently on different random subsets of the training data. Their results are then combined—usually by averaging (for regression) or voting (for classification). This helps reduce variance and prevents overfitting.
  2. **Boosting: Models are trained one after another. Each new model focuses on fixing the errors made by the previous ones. The final prediction is a weighted combination of all models, which helps reduce bias and improve accuracy.
  3. **Stacking (Stacked Generalization): Multiple different models (often of different types) are trained and their predictions are used as inputs to a final model, called a meta-model. The meta-model learns how to best combine the predictions of the base models, aiming for better performance than any individual model.

While stacking is also a method but bagging and boosting method is widely used and lets see more about them.

1. Bagging Algorithm

Bagging classifier can be used for both regression and classification tasks. Here is an overview of Bagging classifier algorithm****:**

Implementation

**1. Importing Libraries and Loading Data

We will import scikit learn for:

from sklearn.ensemble import BaggingClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

`

**2. Loading and Splitting the Iris Dataset

data = load_iris() X = data.data y = data.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

`

**3. Creating a Base Classifier

Decision tree is chosen as the base model. They are prone to overfitting when trained on small datasets making them good candidates for bagging.

base_classifier = DecisionTreeClassifier()

`

**4. Creating and Training the Bagging Classifier

bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10, random_state=42) bagging_classifier.fit(X_train, y_train)

`

**5. Making Predictions and Evaluating Accuracy

y_pred = bagging_classifier.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print("Accuracy:", accuracy)

`

**Output:

Accuracy: 1.0

**2. Boosting Algorithm

Boosting is an ensemble technique where multiple weak models are trained one after another, and each new model focuses on correcting the errors of the previous one to build a strong model. The process works as follows:

Implementation

**1. Importing Libraries and Modules

from sklearn.ensemble import AdaBoostClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

`

**2. Loading and Splitting the Dataset

data = load_iris() X = data.data y = data.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

`

**3. Defining the Weak Learner

We are creating the base classifier as a decision tree with maximum depth 1 (a decision stump). This simple tree will act as a weak learner for the AdaBoost algorithm, which iteratively improves by combining many such weak learners.

Python `

base_classifier = DecisionTreeClassifier(max_depth=1)

`

**4. Creating and Training the AdaBoost Classifier

adaboost_classifier = AdaBoostClassifier( base_classifier, n_estimators=50, learning_rate=1.0, random_state=42 ) adaboost_classifier.fit(X_train, y_train)

`

**5. Making Predictions and Calculating Accuracy

We are calculating the accuracy of the model by comparing the true labels y_test with the predicted labels y_pred. The accuracy_score function returns the proportion of correctly predicted samples. Then, we print the accuracy value.

Python `

y_pred = adaboost_classifier.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print("Accuracy:", accuracy)

`

**Output:

Accuracy: 1.0

Importance

Ensemble learning is a versatile approach that can be applied to machine learning model for:

Ensemble Learning Techniques

Technique Category Description
Random Forest Bagging Random forest constructs multiple decision trees on bootstrapped subsets of the data and aggregates their predictions for final output, reducing overfitting and variance.
Random Subspace Method Bagging Trains models on random subsets of input features to enhance diversity and improve generalization while reducing overfitting.
Gradient Boosting Machines (GBM) Boosting Gradient Boosting Machines sequentially builds decision trees, with each tree correcting errors of the previous ones, enhancing predictive accuracy iteratively.
Extreme Gradient Boosting (XGBoost) Boosting XGBoost do optimizations like tree pruning, regularization and parallel processing for robust and efficient predictive models.
AdaBoost (Adaptive Boosting) Boosting AdaBoost focuses on challenging examples by assigning weights to data points. Combines weak classifiers with weighted voting for final predictions.
CatBoost Boosting CatBoost specialize in handling categorical features natively without extensive preprocessing with high predictive accuracy and automatic overfitting handling.