What is Isolation Forest (original) (raw)

Last Updated : 11 Nov, 2025

Isolation Forest is a useful and efficient algorithm used for anomaly detection making it a popular choice across industries like cybersecurity, finance, healthcare and manufacturing. It works by isolating data points that differ significantly from normal observations using random partitioning. Since anomalies are few and distinct, they are isolated faster than normal data, enabling quick identification of outliers with minimal computational effort.

Working of Isolation Forest

Isolation Forest operates through a recursive partitioning process, creating multiple decision trees that help identify anomalies. Here's a step-by-step breakdown:

1. Random Partitioning

2. Isolation Path

3. Ensemble of Trees

4. Anomaly Scoring

5. Classification

Example of Isolation Forest Algorithm

input_dataset-correct

Implementation

Here we are going to perform anomaly detection on credit card transaction using the algorithm by using the following steps:

Step 1: Importing required libraries

We will be importing Pandas, Numpy, Seaborn, Matplotlib and Sckit-learn libraries for data manipulation, preprocessing and visualizations.

Python `

import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.ensemble import IsolationForest from sklearn.metrics import accuracy_score, classification_report from sklearn.preprocessing import StandardScaler

`

Step 2: Dataset Loading and Pre-processing

We are using a Credit Card Anomaly detection dataset for its implementation and limit its row count to 40,000 for faster processing. We then standardize the features of the dataset excluding the target variable 'Class' using StandardScaler.

Used dataset can be downloaded from here.

Python `

credit_data = pd.read_csv('/content/creditcard.csv', nrows=40000) scaler = StandardScaler().fit_transform(credit_data.loc[:,credit_data.columns!='Class']) scaled_data = scaler[0:40000] df = pd.DataFrame(data=scaled_data) X = credit_data.drop(columns=['Class']) y = credit_data['Class']

`

Step 3: Model Making

Now we will define the Isolation Forest model. We calculate the fraction of outliers by looking at the number of fraudulent transactions in the dataset then we create and fit the Isolation Forest model with this outlier fraction.

outlier_fraction = len(credit_data[credit_data['Class']==1])/float(len(credit_data[credit_data['Class']==0])) model = IsolationForest(n_estimators=100, contamination=outlier_fraction, random_state=42) model.fit(df)

`

**Output:

s123345

Model

Step 4: Model Evaluation

Next we will evaluate the model’s performance by calculating its accuracy in detecting anomalies (fraudulent transactions) based on the anomaly scores.

scores_prediction = model.decision_function(df) y_pred = model.predict(df) y_pred[y_pred == 1] = 0 y_pred[y_pred == -1] = 1 print("Accuracy in finding anomaly:",accuracy_score(y,y_pred))

`

**Output:

Accuracy in finding anomaly: 0.997175

So we have achieved an accuracy of 99.72% in detecting anomalies with the Isolation Forest model.

Step 5: Comparative Visualization

Now to understand how well the model separates normal and anomalous instances, we will plot the 'Amount' feature to visualize the distinction between normal and fraudulent transactions. We can easily replace 'Amount' with any other feature to visualize its results.

Python `

y_feature = credit_data['Amount'] credit_data['predicted_class'] = y_pred

plt.figure(figsize=(7, 4)) sns.scatterplot(x=credit_data.index, y=y_feature, hue=credit_data['predicted_class'], palette={0: 'blue', 1: 'red'}, s=50) plt.title('Visualization of Normal vs Anomalous Transactions') plt.xlabel('Data points') plt.ylabel(y_feature.name) plt.legend(title='Predicted Class', loc='best') plt.show()

`

**Output:

From the above plot, we can clearly see that the normal instances and anomalous instances are separated in well manner with very little overlap.

Applications

Isolation Forest is used across various industries to detect anomalies. Here are some key applications:

Advantages

Lets see various advantages of Isolation Forest:

Limitations