Implementing Generalized Least Squares (GLS) in Python (original) (raw)

Last Updated : 23 Jul, 2025

Generalized Least Squares (GLS) is a statistical technique used to estimate the unknown parameters in a linear regression model. It extends the Ordinary Least Squares (OLS) method by addressing situations where the assumptions of OLS are violated, specifically when there is heteroscedasticity or autocorrelation in the error terms. GLS is crucial in providing more accurate and reliable estimates in such scenarios. This article provides a detailed exploration of the GLS method, its underlying principles, and its implementation in Python.

Table of Content

**What is Generalized Least Squares (GLS)?

Generalized Least Squares (GLS) is an extension of the Ordinary Least Squares (OLS) regression method used to estimate the unknown parameters in a linear regression model. GLS is particularly useful when the assumptions of OLS, such as homoscedasticity (constant variance of errors) and absence of autocorrelation (independence of errors), are violated. By addressing heteroscedasticity and autocorrelation in the error terms, GLS provides more accurate and reliable parameter estimates.

**Why Use GLS?

Implementing GLS in Python

Python offers several libraries for implementing GLS, with statsmodels being one of the most popular. statsmodels provides comprehensive tools for statistical modeling, including GLS.

To install statsmodels, you can use pip:

pip install statsmodels

**Example 1: Implementing GLS with statsmodels

Let's walk through an example using the Longley dataset, a classic dataset in econometrics.

**1. Loading the Data

Python `

import numpy as np import statsmodels.api as sm

np.random.seed(0) n = 100 X = np.random.rand(n, 2) X = sm.add_constant(X) # Adds a constant term to the predictor

Generate heteroscedastic and autocorrelated errors

errors = np.random.randn(n) for i in range(1, n): errors[i] += 0.5 * errors[i-1]

beta = np.array([1, 0.5, -0.3]) y = X @ beta + errors

`

**2. Fitting an OLS Model

First, we fit an OLS model to obtain the residuals.

Python `

ols_model = sm.OLS(y, X).fit() print(ols_model.summary())

`

Output:

OLS Regression Results:
OLS Regression Results

Dep. Variable: y R-squared: 0.003
Model: OLS Adj. R-squared: -0.018
Method: Least Squares F-statistic: 0.1307
Date: Wed, 17 Jul 2024 Prob (F-statistic): 0.878
Time: 11:30:25 Log-Likelihood: -147.01
No. Observations: 100 AIC: 300.0
Df Residuals: 97 BIC: 307.8
Df Model: 2
Covariance Type: nonrobust

             coef    std err          t      P>|t|      [0.025      0.975]  

const 1.0183 0.296 3.435 0.001 0.430 1.607
x1 -0.0074 0.380 -0.019 0.985 -0.762 0.748
x2 -0.1917 0.375 -0.511 0.610 -0.936 0.553

Omnibus: 3.211 Durbin-Watson: 1.053
Prob(Omnibus): 0.201 Jarque-Bera (JB): 2.550
Skew: 0.339 Prob(JB): 0.279
Kurtosis: 3.389 Cond. No. 5.58

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

**3. Estimating the Covariance Matrix

Assume the error terms follow an AR(1) process. We estimate the autocorrelation parameter 𝜌_ρ by regressing the residuals on their lagged values.

Python `

residuals = ols_model.resid cov_matrix = np.diag(residuals**2)

`

**4. Fitting the GLS Model

Finally, we fit the GLS model using the constructed covariance matrix.

Python `

gls_model = sm.GLS(y, X, sigma=cov_matrix).fit() print(gls_model.summary())

`

Output:

GLS Regression Results:
GLS Regression Results

Dep. Variable: y R-squared: 0.120
Model: GLS Adj. R-squared: 0.102
Method: Least Squares F-statistic: 6.633
Date: Wed, 17 Jul 2024 Prob (F-statistic): 0.00200
Time: 11:30:25 Log-Likelihood: -81.748
No. Observations: 100 AIC: 169.5
Df Residuals: 97 BIC: 177.3
Df Model: 2
Covariance Type: nonrobust

             coef    std err          t      P>|t|      [0.025      0.975]  

const 1.0210 0.035 29.584 0.000 0.952 1.089
x1 -0.0338 0.054 -0.628 0.531 -0.141 0.073
x2 -0.1543 0.044 -3.482 0.001 -0.242 -0.066

Omnibus: 866.058 Durbin-Watson: 1.353
Prob(Omnibus): 0.000 Jarque-Bera (JB): 16.264
Skew: 0.200 Prob(JB): 0.000294
Kurtosis: 1.065 Cond. No. 8.08

Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

**Example 2: Implementing Feasible Generalized Least Squares (FGLS)

In many practical situations, the exact form of the covariance matrix \Omega is unknown. Feasible Generalized Least Squares (FGLS) is an iterative method that estimates \Omega from the data and refines the GLS estimates.

**Steps for FGLS:

  1. Fit an initial OLS model and obtain residuals.
  2. Estimate the covariance matrix \Omega from the residuals.
  3. Fit a GLS model using the estimated \Omega.
  4. Iterate the process to refine the estimates. Python `

import numpy as np import statsmodels.api as sm from scipy.linalg import toeplitz

Step 1: Generate sample data or use a dataset

np.random.seed(0) n = 100 X = np.random.rand(n, 2) X = sm.add_constant(X) # Adds a constant term to the predictor

Generate heteroscedastic and autocorrelated errors

errors = np.random.randn(n) for i in range(1, n): errors[i] += 0.5 * errors[i-1]

beta = np.array([1, 0.5, -0.3]) y = X @ beta + errors

Fit initial OLS model

ols_model = sm.OLS(y, X).fit() ols_resid = ols_model.resid

Step 2: Estimate covariance matrix

resid_fit = sm.OLS(ols_resid[1:], sm.add_constant(ols_resid[:-1])).fit() rho = resid_fit.params[1] sigma = rho**toeplitz(np.arange(len(ols_resid)))

Step 3: Fit GLS model using estimated covariance matrix

gls_model = sm.GLS(y, X, sigma=sigma) gls_results = gls_model.fit()

Step 4: Iterate to refine estimates

for _ in range(5): ols_resid = gls_results.resid resid_fit = sm.OLS(ols_resid[1:], sm.add_constant(ols_resid[:-1])).fit() rho = resid_fit.params[1] sigma = rho**toeplitz(np.arange(len(ols_resid))) gls_model = sm.GLS(y, X, sigma=sigma) gls_results = gls_model.fit()

print(gls_results.summary())

`

Output:

                        GLS Regression Results                              

==============================================================================
Dep. Variable: y R-squared: 0.022
Model: GLS Adj. R-squared: 0.001
Method: Least Squares F-statistic: 1.073
Date: Wed, 17 Jul 2024 Prob (F-statistic): 0.346
Time: 11:35:00 Log-Likelihood: -134.18
No. Observations: 100 AIC: 274.4
Df Residuals: 97 BIC: 282.2
Df Model: 2
Covariance Type: nonrobust

             coef    std err          t      P>|t|      [0.025      0.975]  

const 0.9643 0.300 3.212 0.002 0.368 1.560
x1 0.2923 0.313 0.935 0.352 -0.328 0.913
x2 -0.3527 0.340 -1.038 0.302 -1.027 0.322

Omnibus: 0.332 Durbin-Watson: 1.982
Prob(Omnibus): 0.847 Jarque-Bera (JB): 0.478
Skew: 0.114 Prob(JB): 0.787
Kurtosis: 2.750 Cond. No. 3.05

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Evaluating GLS Models

Evaluating the performance of GLS models involves several diagnostic checks:

Conclusion

Generalized Least Squares (GLS) is a powerful tool for regression analysis, especially when dealing with heteroscedasticity and autocorrelation in the error terms. By transforming the data to meet the OLS assumptions, GLS provides more accurate and reliable parameter estimates.