Ordinary Least Squares (OLS) using statsmodels (original) (raw)

Last Updated : 15 Jul, 2025

Ordinary Least Squares (OLS) is a widely used statistical method for estimating the parameters of a linear regression model. It minimizes the sum of squared residuals between observed and predicted values. In this article we will learn how to implement **Ordinary Least Squares (OLS) regression using Python's statsmodels module.

Overview of Linear Regression Model

A **linear regression model establishes the relationship between a dependent variable (_y) and one or more independent variables (_x):

\hat{y} = b_1 x + b_0

Where:

The **OLS method minimizes the total sum of squares of residuals (_S) defined as:

S = \sum_{i=1}^{n} \epsilon_i^2 = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

To find the optimal values of __b_0​ and __b_1​ partial derivatives of _S with respect to each coefficient are taken and set to zero.

**Implementation OLS Regression Using Statsmodels

**Step 1: Import Required Libraries

Before starting, we need to import necessary libraries like pandas , numpy and matplotlib.

Python `

import statsmodels.api as sm import pandas as pd
import matplotlib.pyplot as plt import numpy as np

`

**Step 2: Load and Prepare the Data

We load the dataset from a CSV file using pandas. You can download dataset from here. The dataset contains two columns:

data = pd.read_csv('train.csv')

x = data['x'].tolist()
y = data['y'].tolist()

`

**Step 3: Add a Constant Term

In linear regression the equation includes an intercept term (__b_0​). To include this term in the model we use the add_constant() function from statsmodels.

Python `

x = sm.add_constant(x)

`

**Step 4: Perform OLS Regression

Now we fit the OLS regression model using the OLS() function. This function takes the dependent variable (_y) and the independent variable (_x) as inputs.

Python `

result = sm.OLS(y, x).fit()

print(result.summary())

`

**Output :

Screenshot-2025-04-08-103151

**Step 5: Visualize the Regression Line

To better understand the relationship between _x and _y we plot the original data points and the fitted regression line.

Python `

plt.scatter(data['x'], data['y'], color='blue', label='Data Points')

x_range = np.linspace(data['x'].min(), data['x'].max(), 100) y_pred = result.params[0] + result.params[1] * x_range

plt.plot(x_range, y_pred, color='red', label='Regression Line') plt.xlabel('Independent Variable (X)') plt.ylabel('Dependent Variable (Y)') plt.title('OLS Regression Fit') plt.legend() plt.show()

`

**Output:

file

Regression Line

The above plot shows a strong linear relationship between the independent variable (X) and the dependent variable (Y). Blue dots represent the actual data points which are closely aligned with the red regression line indicating a good model fit.