Tree Based Machine Learning Algorithms (original) (raw)

Last Updated : 2 May, 2026

Tree based algorithms are important in machine learning as they mimic human decision making using a structured approach. They build models as decision trees, where data is split step by step based on features until a final prediction is made.

tree-based-ml-algs

Tree based algorithms

Key components include:

Working of Tree based Algorithms

1. Decision Tree

A Decision Tree is the core of tree based algorithms, creating a structured flow by splitting data into smaller subsets using mathematical rules. Advanced models like Random Forest and Gradient Boosting are built on this foundation. The structural elements include:

decision_tree

Decision Tree

Decision Tree Implementation

Here we implement Decision tree using sklearn. We can also switch between Gini and Entropy using the criterion parameter.

from sklearn.datasets import load_breast_cancer from sklearn.tree import DecisionTreeClassifier from sklearn import tree import matplotlib.pyplot as plt

data = load_breast_cancer() X, y = data.data, data.target

clf = DecisionTreeClassifier(criterion='gini', random_state=42,max_depth=2)

clf.fit(X, y)

plt.figure(figsize=(12,8)) tree.plot_tree(clf, feature_names=data.feature_names, filled=True) plt.show()

`

**Output:

decision-tree-output

Decision Tree

2. Random Forest

Random Forest is an Ensemble Learning Technique that builds multiple decision trees and combines their outputs to give more stable and accurate predictions.

random_forest_algorithm

Random Forest

Random Forest Implementation

This code trains a Random Forest Classifier on the Breast Cancer dataset and evaluates its performance.

**1. RandomForestClassifier(n_estimators=100, random_state=42):

**2. rf.fit(X_train, y_train): trains the model on training data.

**3. rf.predict(X_test): generates predictions on test data.

**4. accuracy_score(...): calculates how many predictions were correct.

Python `

from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

data = load_breast_cancer() X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

rf = RandomForestClassifier(n_estimators=100, random_state=42)

rf.fit(X_train, y_train)

predictions = rf.predict(X_test)

print("Accuracy:", accuracy_score(y_test, predictions))

`

**Output:

Accuracy: 0.96

3. Gradient Boosting

In Gradient Boosting trees are built one after another, and each new tree learns from the errors of previous trees. It improves the model step by step using gradient descent.

Gradient Boosting Implementation

from sklearn.ensemble import GradientBoostingClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

data = load_breast_cancer() X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 )

gb = GradientBoostingClassifier( n_estimators=100,
learning_rate=0.1,
max_depth=3,
random_state=42 )

gb.fit(X_train, y_train)

predictions = gb.predict(X_test)

print("Accuracy:", accuracy_score(y_test, predictions))

`

**Output:

Accuracy: 0.956

4. XGBoost

XGBoost (Extreme Gradient Boosting) is an optimized version of gradient boosting that focuses on speed, scalability and high performance. It enhances traditional boosting by adding system-level optimizations and regularization.

XGBoost Implementation

from xgboost import XGBClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

data = load_breast_cancer() X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 )

model = XGBClassifier( n_estimators=100, learning_rate=0.1, random_state=42, use_label_encoder=False, eval_metric='logloss' )

model.fit(X_train, y_train)

predictions = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, predictions))

`

**Output:

Accuracy: 0.956

5. AdaBoost

AdaBoost (Adaptive Boosting) combines multiple weak learners into a strong model by focusing more on difficult samples at each step. It adjusts data weights instead of correcting errors like Gradient Boosting.

AdaBoost Implementation

from sklearn.ensemble import AdaBoostClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

data = load_breast_cancer() X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 )

model = AdaBoostClassifier( n_estimators=100, learning_rate=1.0, random_state=42 )

model.fit(X_train, y_train)

predictions = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, predictions))

`

**Output:

Accuracy: 0.973

6. LightGBM

LightGBM (Light Gradient Boosting Machine) is a high performance boosting framework built to handle large scale and high dimensional data efficiently. It improves traditional Gradient Boosting by optimizing both speed and memory usage.

LightGBM Implementation

from lightgbm import LGBMClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

data = load_breast_cancer() X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 )

model = LGBMClassifier( n_estimators=100, learning_rate=0.1, random_state=42 )

model.fit(X_train, y_train)

predictions = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, predictions))

`

**Output:

Accuracy: 0.964

7. CatBoost

CatBoost (Categorical Boosting) is a Gradient Boosting algorithm designed to efficiently handle datasets with categorical features. It reduces the need for complex preprocessing while maintaining strong predictive performance.

CatBoost Implementation

from catboost import CatBoostClassifier from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

data = load_breast_cancer() X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 )

model = CatBoostClassifier( iterations=100, learning_rate=0.1, random_state=42, verbose=0 )

model.fit(X_train, y_train)

predictions = model.predict(X_test)

print("Accuracy:", accuracy_score(y_test, predictions))

`

**Output:

Accuracy: 0.964

Advantages

**Limitations