Iterative Dichotomiser 3 (ID3) Algorithm From Scratch (original) (raw)

Last Updated : 7 Feb, 2026

The Iterative Dichotomiser 3 (ID3) algorithm is a decision tree learning algorithm used for solving classification problems. It constructs a tree by selecting attributes that maximize Information Gain, which is computed using entropy. ID3 follows a greedy, top-down recursive strategy to split the dataset until the classification becomes pure or no attributes remain. Let's see some key concepts:

Working

Let's see how ID3 works

  1. **Initialize the Dataset: The algorithm begins by taking the complete training dataset along with the target class attribute. At this stage, all input features are considered as potential candidates for splitting.
  2. **Compute Entropy of Target Attribute: Entropy is calculated for the target class to measure the level of impurity or randomness present in the dataset. This value acts as a baseline for evaluating future splits.
  3. **Calculate Information Gain for Each Attribute: For every input attribute, the dataset is split based on its distinct values and entropy is computed for each subset. Information gain is then calculated as the reduction in entropy caused by splitting on that attribute.
  4. **Select the Best Attribute for Splitting: The attribute with the highest information gain is selected as the decision node because it best separates the data into homogeneous classes.
  5. **Partition the Dataset: The dataset is divided into multiple subsets based on the values of the selected attribute. Each subset corresponds to one branch of the decision tree.
  6. **Create Child Nodes Recursively: The same process of entropy calculation and information gain evaluation is recursively applied to each subset to grow the decision tree further.
  7. **Check Stopping Conditions: The recursion stops if all instances in a subset belong to the same class, if no attributes are left for further splitting or if the subset becomes empty.
  8. **Assign Class Labels to Leaf Nodes: When a stopping condition is met, a leaf node is created and assigned the majority class label of the corresponding subset.

Implementation

Let's see the python implementation,

Step 1: Import Libraries

We need to import the necessary libraries such as Pandas, NumPy, Matplotlib.

Python `

import pandas as pd import numpy as np import matplotlib.pyplot as plt

`

Step 2: Entropy Function

def entropy(target_col): elements, counts = np.unique(target_col, return_counts=True) entropy_value = -np.sum([ (counts[i] / np.sum(counts)) * np.log2(counts[i] / np.sum(counts)) for i in range(len(elements)) ]) return entropy_value

`

Step 3: Information Gain Function

def information_gain(data, feature, target="Class"): total_entropy = entropy(data[target]) values, counts = np.unique(data[feature], return_counts=True)

weighted_entropy = np.sum([
    (counts[i] / np.sum(counts)) *
    entropy(data[data[feature] == values[i]][target])
    for i in range(len(values))
])

return total_entropy - weighted_entropy

`

Step 4: Recursive ID3 Tree Construction

def id3(data, original_data, features, target="Class", parent_node_class=None):

`

Step 5: Handle Stopping Conditions and Majority Class

if len(np.unique(data[target])) == 1: return np.unique(data[target])[0]

if len(data) == 0: return np.unique(original_data[target])[np.argmax( np.unique(original_data[target], return_counts=True)[1])]

if len(features) == 0: return parent_node_class

`

Step 6: Select Best Feature and Split Dataset

parent_node_class = np.unique(data[target])[np.argmax( np.unique(data[target], return_counts=True)[1])]

gains = [information_gain(data, feature, target) for feature in features] best_feature = features[np.argmax(gains)]

tree = {best_feature: {}} features = [f for f in features if f != best_feature]

`

Step 7: Recursive Subtree Generation

for value in np.unique(data[best_feature]): subset = data[data[best_feature] == value] subtree = id3(subset, original_data, features, target, parent_node_class) tree[best_feature][value] = subtree

`

Step 8: Execute Model and Output Tree

data = pd.DataFrame({...}) features = list(data.columns[:-1]) tree = id3(data, data, features) print(tree)

`

**Output:

{'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': 'No'}}

Applications

Advantages

Limitations