Stepwise Regression in Python (original) (raw)

Last Updated : 14 Apr, 2026

Stepwise regression is a method of fitting a regression model by iteratively adding or removing variables. It is used to build a model that is accurate and parsimonious, meaning that it has the smallest number of variables that can explain the data.

**Stepwise regression combines both forward selection and backward elimination approaches:

Unlike pure forward or backward methods, stepwise regression dynamically adds or removes variables at each step based on a chosen criterion (such as AIC, BIC, or p-values).

Use of Stepwise Regression

Stepwise Regression And Other Regression Models

**Advantages of Stepwise Regression:

**Limitations:

Difference between stepwise regression and Linear regression

Feature Linear Regression Stepwise Regression
Purpose Models relationship between variables Selects best subset of variables + builds model
Variables Uses all given predictors Selects important predictors automatically
Process One-time model fitting Iterative (add/remove variables)
Feature Selection Not included Built-in feature selection
Complexity Fixed Dynamic

Implemplementation of Stepwise Regression in Python

To perform stepwise regression in Python, you can follow these steps:

Use the k_features attribute of the fitted model to see which features were selected by the stepwise regression.

Importing Libraries

To implement stepwise regression, you will need to have the following libraries installed:

The first step is to define the array of data and convert it into a dataframe using the NumPy and pandas libraries. Then, the features and target are selected from the dataframe using the iloc method.

Python `

import pandas as pd import numpy as np from sklearn import linear_model from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from mlxtend.feature_selection import SequentialFeatureSelector

Define the array of data

data = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])

Convert the array into a dataframe

df = pd.DataFrame(data)

Select the features and target

X = df.iloc[:, :-1] y = df.iloc[:, -1]

`

Model Development in Stepwise Regression

Next, stepwise regression is performed using the SequentialFeatureSelector() function from the mlxtend library. This function uses a logistic regression model to select the most important features in the dataset, and the number of selected features can be specified using the k_features parameter.

Python `

Perform stepwise regression

sfs = SequentialFeatureSelector(linear_model.LogisticRegression(), k_features=3, forward=True, scoring='accuracy', cv=None) selected_features = sfs.fit(X, y)

`

After the stepwise regression is complete, the selected features are checked using the selected_features.k_feature_names_ attribute and a data frame with only the selected features are created. Finally, the data is split into train and test sets using the train_test_split() function from the sklearn library, and a logistic regression model is fit using the selected features. The model performance is then evaluated using the accuracy_score() function from the sklearn library.

Python `

Create a dataframe with only the selected features

selected_columns = [0, 1, 2, 3] df_selected = df[selected_columns]

Split the data into train and test sets

X_train, X_test,
y_train, y_test = train_test_split( df_selected, y, test_size=0.3, random_state=42)

Fit a logistic regression model using the selected features

logreg = linear_model.LogisticRegression() logreg.fit(X_train, y_train)

Make predictions using the test set

y_pred = logreg.predict(X_test)

Evaluate the model performance

print(y_pred)

`

**Output:

[8]

The difference between linear regression and stepwise regression is that stepwise regression is a method for building a regression model by iteratively adding or removing predictors, while linear regression is a method for modeling the relationship between a response and one or more predictor variables.

In the stepwise regression examples, the mlxtend library is used to iteratively add or remove predictors based on their relationship with the response variable, while in the linear regression examples, all predictors are used to fit the model.