Logistic Regression using Statsmodels (original) (raw)

Last Updated : 25 Oct, 2025

Logistic regression is a statistical technique used for predicting outcomes that have two possible classes like yes/no or 0/1. Using Statsmodels in Python, we can implement logistic regression and obtain detailed statistical insights such as coefficients, p-values and confidence intervals.

_what_is_logistic_regression.webp

Need for Statsmodels

Some of the reasons to use Statsmodels for logistic regression are:

  1. **Detailed Statistical Output: Shows p-values, confidence intervals and model fit metrics.
  2. **Ease of Interpretation: Allows analysts to understand the effect of each variable on predictions.
  3. **Flexibility: Supports categorical variables, interactions and transformations.
  4. **Integration with Python Libraries: Works seamlessly with pandas and NumPy for data handling.

Building the Logistic Regression model

In this example, we predict whether a student will be admitted to a college based on their GMAT score, GPA and work experience. The target variable is binary i.e. admitted or not admitted.

Step 1: Importing Libraries

Importing libraries like statsmodel and pandas.

Python `

import statsmodels.api as sm import pandas as pd

`

Step 2: Loading Training Dataset

Here we will load the training dataset. You can download dataset from here.

Python `

df = pd.read_csv('logit_train1.csv', index_col = 0)

`

Step 3: Define Dependent and Independent Variable

Defining dependent and independent variables for training.

Python `

Xtrain = df[['gmat', 'gpa', 'work_experience']] ytrain = df[['admitted']]

`

Step 4: Build the Model

Building the model using statsmodel module. Here we use sm.Logit() method to train logistic regression model.

Python `

log_reg = sm.Logit(ytrain, Xtrain).fit()

`

Step 5: Perform Predictions

Performing predictions on testing data.

Python `

yhat = log_reg.predict(Xtest) prediction = list(map(round, yhat))

print('Actual values', list(ytest.values)) print('Predictions :', prediction)

`

**Output:

Actual values: [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions: [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]

Step 6: Confusion Matrix

Testing the accuracy of the model and visualizing in the form of confusion matrix.

Python `

from sklearn.metrics import (confusion_matrix, accuracy_score) cm = confusion_matrix(ytest, prediction) print ("Confusion Matrix : \n", cm) print('Test accuracy = ', accuracy_score(ytest, prediction))

`

**Output :

Confusion Matrix:
[[6 0]
[2 2]]
Test accuracy: 0.8

We can see our model is working fine.