Feature Importance with Random Forests (original) (raw)

Last Updated : 11 Nov, 2025

Feature Importance in Random Forests measures how much each feature contributes to the model’s prediction accuracy. It helps in identifying the most influential input variables, improving performance, interpretability and computational efficiency.

Importance

Understanding feature importance offers several advantages:

Feature Importance in Random Forests

Random Forests, a popular ensemble learning algorithm, consist of multiple decision trees that combine to produce robust predictions. They inherently provide mechanisms to measure feature importance using various methods such as:

Method 1: **Built-in Feature Importance

To show implementation the iris dataset is used throughout the article to understand the implementation of feature importance.

Step 1: Install dependencies and Libraries

We will install the required libraries and packages.

!pip install shap from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split import pandas as pd import shap from sklearn.metrics import accuracy_score import numpy as np

iris = load_iris() X = iris.data y = iris.target feature_names = iris.feature_names

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = 0.25, random_state=42) # Split dataset into 75% train and 25% test clf = RandomForestClassifier(n_estimators=100, random_state=42) clf.fit(X_train, y_train)

`

Step 2: Calculate Gini Importance

importances = clf.feature_importances_ feature_imp_df = pd.DataFrame({'Feature': feature_names, 'Gini Importance': importances}).sort_values( 'Gini Importance', ascending=False) print(feature_imp_df)

`

**Output:

Screenshot-2025-11-03-153149

**Visualization plot for feature importance

Python `

plt.figure(figsize=(8, 4)) plt.barh(feature_names, importances, color='skyblue') plt.xlabel('Gini Importance') plt.title('Feature Importance - Gini Importance') plt.gca().invert_yaxis() plt.show()

`

**Output:

feature-importance

**Method 2: Mean Decrease Accuracy (MDA)

importances = [] initial_accuracy = accuracy_score(y_test, clf.predict(X_test)) for i in range(X.shape[1]): X_test_copy = X_test.copy() np.random.shuffle(X_test_copy[:, i]) shuff_accuracy = accuracy_score(y_test, clf.predict(X_test_copy)) importances.append(initial_accuracy - shuff_accuracy)

accuracy_df = pd.DataFrame({'Feature': feature_names, 'Decrease in Accuracy': importances}).sort_values( 'Decrease in Accuracy', ascending=False) print(accuracy_df)

`

**Output:

Screenshot-2025-11-03-153043

**Visualization plot for feature importance

Python `

plt.figure(figsize=(8, 4)) plt.barh(feature_names, importances, color='skyblue') plt.xlabel('Mean Decrease Accuracy') plt.title('Feature Importance - Mean Decrease Accuracy') plt.gca().invert_yaxis() plt.show()

`

**Output:

featureimp

Method 3: Permutation Feature Importance

from sklearn.inspection import permutation_importance result = permutation_importance( clf, X_test, y_test, n_repeats=10, random_state=0, n_jobs=-1) perm_imp_df = pd.DataFrame({'Feature': feature_names, 'Permutation Importance': result.importances_mean}).sort_values( 'Permutation Importance', ascending=False) print(perm_imp_df)

`

Output:

Screenshot-2025-11-03-153031

Here's what the permutation importance values suggest in this output:

**Visualization plot for feature importance

Python `

plt.figure(figsize=(6, 6)) plt.bar(perm_imp_df['Feature'], perm_imp_df['Permutation Importance']) plt.xlabel('Feature') plt.ylabel('Permutation Importance') plt.title('Permutation Feature Importance') plt.xticks(rotation=45, ha='right') plt.tight_layout() plt.show()

`

**Output:

features

Method 4: SHAP values

SHAP (SHapley Additive exPlanations) value is a measure of the contribution of a feature towards the prediction for each instance.

Positive SHAP values indicate a positive contribution to the prediction, while negative values suggest a negative contribution. The magnitude of the SHAP value represents the strength of the contribution.

Python `

import shap import numpy as np import pandas as pd import matplotlib.pyplot as plt

explainer = shap.TreeExplainer(clf) shap_values = explainer.shap_values(X_test) if isinstance(shap_values, list): shap_summary = np.mean(np.abs(np.stack(shap_values)), axis=(0, 1)) else: shap_summary = np.abs(shap_values).mean(axis=0)

shap_summary = shap_summary.flatten() shap_summary = shap_summary[:len(feature_names)] shap_summary_df = pd.DataFrame({ 'Feature': feature_names, 'Mean |SHAP Value|': shap_summary }).sort_values('Mean |SHAP Value|', ascending=False)

print(shap_summary_df) plt.figure(figsize=(8, 4)) plt.barh(shap_summary_df['Feature'], shap_summary_df['Mean |SHAP Value|'], color='skyblue') plt.xlabel('Mean Absolute SHAP Value') plt.title('Feature Importance (SHAP Values)') plt.gca().invert_yaxis() plt.show()

`

**Output:

Screenshot-2025-11-03-153015

shap

Shap Values