Gini Impurity and Entropy in Decision Tree (original) (raw)

Last Updated : 8 Nov, 2025

Decision Trees are classification models that split data into nodes based on feature values. To determine the best split, they rely on impurity metrics that evaluate how mixed a node’s class distribution is. Gini Impurity and Entropy are two measures used in decision trees to decide how to split data into branches. Both help determine how mixed or pure a dataset is, guiding the model toward splits that create cleaner groups.

entropy_vs_gini_impurity

Impurity Measures

Need for Impurity Measures

Some common reasons why impurity criteria are essential in decision tree learning are:

1. Gini Impurity

Gini Impurity checks how often a randomly selected sample would be mislabeled if assigned by class probability. It is computationally simple and used in tree-based classifiers.

**Formula:

\text{Gini} = 1 - \sum_{i=1}^{n} p_i^2

Where p_i is the probability of class i.

**Properties:

2. Entropy

Entropy measures uncertainty in a node’s class distribution and originates from information theory. Higher entropy indicates greater disorder among class labels.

**Formula:

\text{Entropy} = -\sum_{i=1}^{n} p_i \log_{2}(p_i)

Where p_i represents the proportion of class i in the node.

**Properties:

When To Prefer Which Metric?

Some scenarios where one metric may be more practical are:

Scenario Gini Impurity Entropy
Training Speed Faster computation since it avoids log operations Slightly slower due to logarithmic calculations
Split Behavior Creates splits quickly, favoring dominant classes Produces more balanced node partitions
Dataset Size Efficient for large, high-dimensional datasets Useful for structured datasets with balanced classes
Sensitivity to Distribution Less sensitive to small probability changes More sensitive to subtle probability differences
Common Usage Often default in libraries like CART Preferred when theoretical information gain matters

Applications

Some of the use-cases of impurity metrics are:

Advantages

Some benefits of impurity based splitting include:

Disadvantages

Some disadvantages of impurity metrics are: