Ridge Regression (original) (raw)

Last Updated : 9 Dec, 2025

Ridge Regression is a version of linear regression that adds an L2 penalty to control large coefficient values. While Linear Regression only minimizes prediction error, it can become unstable when features are highly correlated. Ridge solves this by shrinking coefficients making the model more stable and reducing overfitting. It helps in:

ridge_estimate

Ridge Regression

Bias-Variance Trade-off in Ridge Regression

One of the central ideas behind ridge regression is the bias-variance trade-off:

model_complexity

Bias-Variance Tradeoff in Ridge Regression

Thus, ridge regression accepts a small increase in bias to gain a larger reduction in variance and this tradeoff is often useful when generalization is important.

Selection of the Ridge Parameter

Choosing the right ridge parameter k is essential because it directly affects the model’s bias-variance balance and overall predictive accuracy. Several systematic approaches exist for determining the optimal value of k, each offering unique strengths and considerations. The major methods are:

1. Cross-Validation

Cross-validation selects the ridge parameter by repeatedly training and testing the model on different subsets of data and identifying the value of k that minimizes validation error.

2. Generalized Cross-Validation (GCV)

It is an efficient alternative to LOOCV that avoids explicitly splitting the data. It estimates the optimal k by minimizing a function that approximates the LOOCV error.

3. Information Criteria

Model selection metrics like AIC and BIC can also guide the choice of k.

4. Empirical Bayes Methods

These methods treat k as a Bayesian hyperparameter and use observed data to estimate its value.

5. Stability Selection

Stability selection enhances robustness by repeatedly fitting the model on subsampled datasets.

Implementation

Let's implement it and here we will import numpy and scikit learn.

import numpy as np from sklearn.preprocessing import StandardScaler from sklearn.linear_model import Ridge from sklearn.model_selection import GridSearchCV, train_test_split from sklearn.metrics import mean_squared_error

np.random.seed(0) X = np.random.randn(200, 6) true_coef = np.array([3.2, -1.5, 0.7, 0, 2.8, -0.5]) y = X.dot(true_coef) + np.random.randn(200) * 0.6

scaler = StandardScaler() X_scaled = scaler.fit_transform(X) X_train, X_test, y_train, y_test = train_test_split( X_scaled, y, test_size=0.25, random_state=42 ) ridge = Ridge(alpha=1.0) ridge.fit(X_train, y_train) pred_basic = ridge.predict(X_test)

print("MSE (alpha = 1.0):", mean_squared_error(y_test, pred_basic)) print("Coefficients (alpha = 1.0):", ridge.coef_) param_grid = {"alpha": [0.001, 0.01, 0.1, 1, 10, 100, 500]} grid = GridSearchCV(Ridge(), param_grid, cv=5, scoring="neg_mean_squared_error") grid.fit(X_train, y_train) best_ridge = grid.best_estimator_ pred_best = best_ridge.predict(X_test)

print("Best alpha selected:", grid.best_params_["alpha"]) print("MSE (best alpha):", mean_squared_error(y_test, pred_best)) print("Coefficients (best alpha):", best_ridge.coef_)

`

**Output:

Screenshot-2025-12-09-145639

Output

Applications

Advantages

Limitations