Decision Tree Algorithms (original) (raw)

Last Updated : 2 May, 2026

Decision Tree Algorithms are widely used supervised machine learning methods for both classification and regression tasks. They split data based on feature values to create a tree-like structure of decisions, starting from a root node and ending at leaf nodes that provide predictions.

types_of_decision_tree_algorithms-1

Types of Decision Tree Algorithms

1. ID3 (Iterative Dichotomiser 3)

ID3 (Iterative Dichotomiser 3) is a decision tree learning algorithm used for solving classification problems. It builds the tree using a top-down, greedy approach by selecting the attribute that provides the highest Information Gain which is calculated using entropy.

How ID3 Builds the Decision Tree

The ID3 algorithm constructs a decision tree by selecting the attribute that best splits the dataset at each step. It uses Entropy and Information Gain to measure impurity and determine the most informative feature for splitting the data.

**Step 1: Initialize Root Node

The entire dataset is placed at the root node containing all training samples.

**Step 2: Calculate Entropy

Entropy measures the amount of randomness or impurity in a dataset. If all data points belong to the same class, entropy is 0 (pure node). If the data is evenly distributed among classes, entropy is higher, indicating more disorder.

H(D) = -\sum_{i=1}^{n} p_i \log_2(p_i)

where

**Step 3: Compute Information Gain

Information Gain measures how much entropy decreases after splitting the dataset based on a particular feature. The feature with the highest Information Gain is selected for the split because it provides the most useful information for classification.

Information Gain = H(D) - \sum_{v=1}^{V} \frac{|D_v|}{|D|} H(D_v)

where

**Step 4: Select Best Attribute and Split

The attribute with the highest Information Gain is selected, and the dataset is divided into subsets based on its values.

**Step 5: Repeat Recursively

The same process continues for each subset until the node becomes pure or no attributes remain for further splitting, forming the final leaf nodes of the tree.

Limitations

**Refer: Iterative Dichotomiser 3 (ID3) Algorithm From Scratch

2. C4.5

C4.5 is an improved extension of the ID3 algorithm. It is designed to overcome several limitations of ID3, such as handling continuous attributes, managing missing values and reducing bias toward attributes with many values by using Gain Ratio instead of Information Gain.

How C4.5 Builds the Decision Tree

C4.5 builds the decision tree by selecting the best attribute for splitting using Gain Ratio, which helps produce balanced splits and reduce bias toward attributes with many values.

**Step1: Initialize Root Node

The entire dataset is placed at the root node containing all training samples.

**Step 2: Compute Information Gain

Information Gain is calculated for each attribute to measure how much entropy decreases after a split.

**Step 3: Compute Split Information

Split Information measures how the dataset is distributed across different branches after splitting.

\text{Split Information} = -\sum_{i=1}^{n} \frac{|D_i|}{|D|} \log_2 \left( \frac{|D_i|}{|D|} \right)

**Step 4: Calculate Gain Ratio

Gain Ratio normalizes Information Gain to avoid bias toward attributes with many distinct values.

\text{Gain Ratio} = \frac{\text{Information Gain}}{\text{Split Information}}

**Step 5: Select Best Attribute and Split

The attribute with the highest Gain Ratio is selected and the dataset is divided into subsets based on its values.

**Step 6: Repeat Recursively

The same process continues for each subset until nodes become pure or no attributes remain, followed by post-pruning to simplify the tree.

Limitations

3. CART (Classification and Regression Trees)

CART is a widely used decision tree algorithm that can handle both classification and regression problems. CART builds binary decision trees by repeatedly splitting the dataset into two subsets based on the most informative feature.

How CART Builds the Decision Tree

CART constructs the decision tree by repeatedly selecting the best feature and split point that reduces impurity in the dataset. The algorithm evaluates different splits and chooses the one that creates the most homogeneous subsets.

**Step 1: Initialize Root Node

The process begins with the entire dataset placed at the root node. This node contains all training samples before any splitting occurs.

**Step 2: Calculate Gini Impurity (for Classification)

CART measures how impure the dataset is using Gini Impurity, which estimates the probability of incorrectly classifying a randomly chosen sample.

Gini(D) = 1 - \Sigma^n _{i=1}\; p^2_{i}

where p_i​ is the probability of class i in dataset D.

**Step 3: Evaluate Possible Splits

The algorithm examines different features and possible split points to determine how well they divide the data into more homogeneous groups.

**Step 4: Select the Best Split

The feature and split point that produce the lowest Gini impurity (for classification) or maximum variance reduction (for regression) are selected to divide the dataset.

**Step 5: Create Binary Branches

CART always creates binary splits, meaning each node is divided into exactly two child nodes (left and right), which simplifies the tree structure.

**Step 6: Repeat Recursively

The same process continues for each subset, splitting the data until stopping criteria are met such as reaching pure nodes or a minimum number of samples.

Limitations

**Refer: Implementing CART (Classification And Regression Tree) in Python

4. CHAID (Chi-Square Automatic Interaction Detection)

CHAID is a decision tree algorithm mainly used for classification and regression analysis, especially when dealing with categorical variables. It builds trees by using statistical chi-square tests to identify the feature that has the strongest relationship with the target variable.

How CHAID Builds the Decision Tree

CHAID constructs the decision tree by analyzing the statistical relationship between each feature and the target variable using the chi-square test.

**Step 1: Initialize Root Node

The entire dataset is placed at the root node, which contains all training samples before any splitting occurs.

**Step 2: Perform Chi-Square Test

For each categorical feature, CHAID calculates the Chi-Square statistic to measure the strength of association between the feature and the target variable.

X^2 = \Sigma \frac{(O_{i} - E_{i})^2}{E_{i}}

where:

**Step 3: Select the Best Feature

The feature with the highest chi-square value (indicating the strongest relationship with the target variable) is selected for splitting the dataset.

**Step 4: Create Multi-Way Branches

CHAID divides the dataset into multiple subsets based on the categories of the selected feature, creating several branches from a single node.

**Step 5: Repeat Recursively

The algorithm continues the same process for each subset until stopping criteria are met, such as reaching statistically insignificant splits or minimum node size.

Prediction Using CHAID

Limitations

5. Conditional Inference Trees

Conditional Inference Trees are decision tree models that use statistical hypothesis tests to select the best feature for splitting the dataset. Unlike algorithms such as CART, they use permutation-based tests to reduce bias toward variables with many categories. This makes them useful when working with datasets containing a mix of categorical and continuous variables.

How Conditional Inference Trees Build the Decision Tree

Conditional Inference Trees construct the tree using a recursive partitioning process based on statistical significance tests.

**Step 1: Initialize Root Node

The entire dataset is placed at the root node, containing all training samples before any splitting occurs.

**Step 2: Test Association Between Features and Target

At each node, the algorithm performs statistical tests to evaluate the relationship between each predictor variable and the target variable. For example, it may use Chi-square tests for categorical variables or F-tests for continuous variables.

**Step 3: Select the Most Significant Feature

The feature with the strongest statistically significant association with the target variable (lowest p-value) is selected as the splitting variable.

**Step 4: Determine the Best Split Point

The algorithm determines the optimal way to divide the data based on the selected feature, creating new subsets that maximize the statistical difference between groups.

**Step 5: Repeat Recursively

The process is repeated for each subset until no statistically significant relationship remains or predefined stopping criteria are met.

Advantages

Limitations

ID3 vs C4.5 vs CART vs CHAID vs Conditional Inference Trees

Algorithm Splitting Method When to Use
ID3 Entropy and Information Gain on categorical features only Simple classification with categorical data
C4.5 Gain Ratio handles continuous and categorical features applies pruning Mixed data types with better generalization than ID3
CART Gini Impurity for classification variance reduction for regression binary splits Classification and regression tasks on tabular data
CHAID Chi-Square test multi-way splits for categorical features Large datasets with many categorical variables
Conditional Inference Trees Statistical hypothesis and permutation tests unbiased splits Mixed data types and unbiased feature selection