CART (Classification And Regression Tree) in Machine Learning (original) (raw)

Last Updated : 4 Dec, 2025

To break a dataset into smaller, meaningful groups, CART (Classification and Regression Tree) is used which builds a decision tree that predicts outcomes for both classification and regression tasks. It works by splitting data based on rules that reduce error at each step.

How CART Builds a Decision Tree

CART (Classification and Regression Trees) constructs a decision tree by recursively splitting the dataset based on the feature and threshold that produce the highest reduction in impurity (for classification) or error (for regression)

root_node

CART

**Step 1: Evaluate Best Split for Each Feature

**Step 2: Select the Optimal Split

**Step 3: Create Binary Child Nodes

**Step 4: Apply Recursive Splitting

CART for Classification

CART is used for classification tasks when the output variable is categorical.

**How it Works

CART chooses splits that produce the purest possible child nodes.

CART for Regression

CART is used for regression tasks where the output variable is numerical.

**How it Works

This helps CART create a tree that provides the lowest possible prediction error.

Splitting Criteria in CART

CART uses different metrics to choose the best splitting rule depending on whether the problem is a classification or regression task. The goal is to find the split that produces the purest child nodes (for classification) or minimum prediction error (for regression).

Splitting Criteria for Classification

CART uses Gini Impurity to measure how mixed the classes are in a node.

Gini Impurity indicates how likely a randomly chosen sample from the node would be incorrectly classified if it were assigned labels according to class distribution.

\text{Gini} = 1 - \sum_{i=1}^{n} p_i^2

Where:

Splitting Criteria for Regression

For regression problems, CART uses Residual Sum of Squares (RSS) or Mean Squared Error (MSE) to find the best split.

**Residual Sum of Squares (RSS): RSS measures the total squared difference between actual output values and predicted values.

RSS = \sum (y - \hat{y})^2

Where:

**Mean Squared Error (MSE): MSE is simply RSS divided by the number of samples.

MSE = \frac{1}{n} \sum (y - \hat{y})^2

Lower RSS or MSE indicates a better split and CART selects the threshold that minimizes the prediction error in the resulting child nodes.

Pruning in CART

Pruning is used to prevent overfitting by trimming branches of the decision tree that add little or no improvement to model accuracy. It simplifies the tree, improves generalization and reduces model complexity.

**Types of Pruning in CART

  1. **Cost Complexity Pruning: Removes branches by comparing the trade off between tree accuracy and tree size, keeping only those nodes that significantly improve performance.
  2. **Reduced Error Pruning: Eliminates nodes that do not improve the model’s accuracy on a validation dataset, ensuring only beneficial splits remain.

Common Stopping or Pruning Criteria Used in CART

Hyperparameters in CART

CART provides several hyperparameters to control the tree structure, prevent overfitting and improve model performance. Important hyperparameters include:

Step-By-Step Implementation

Here we builds and evaluates a Decision Tree (CART) model on the Iris dataset, generating predictions, accuracy metrics and visualizations of the trained tree using Matplotlib and Graphviz.

Step 1: Import Required Libraries

Here we will import pandas, seaborn, matplotlib and scikit learn.

Python `

import pandas as pd import seaborn as sns import matplotlib.pyplot as plt

from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier, export_graphviz from sklearn.metrics import accuracy_score, confusion_matrix, classification_report from sklearn import tree

import pydotplus from IPython.display import Image

`

Step 2: Load and Prepare the Dataset

iris = load_iris() df = pd.DataFrame(iris.data, columns=iris.feature_names) df['species'] = iris.target

X = iris.data y = iris.target

`

Step 3: Split Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42)

`

Step 4: Train the Decision Tree

model = DecisionTreeClassifier(criterion="entropy", random_state=42) model.fit(X_train, y_train)

`

Step 5: Make Predictions and Evaluate Model

y_pred = model.predict(X_test)

print("\nConfusion Matrix:") cm = confusion_matrix(y_test, y_pred) sns.heatmap(cm, annot=True, cmap="Blues", fmt="d") plt.title("Confusion Matrix") plt.xlabel("Predicted") plt.ylabel("Actual") plt.show()

`

**Output:

cart1

Confusion Matrix

Step 6: Visualize the Decision Tree

plt.figure(figsize=(15,10)) tree.plot_tree( model, feature_names=iris.feature_names, class_names=iris.target_names, filled=True, rounded=True, fontsize=10 ) plt.show()

`

**Output:

cart2

You can download full code from here

Applications

Advantages

Limitations