Ridge Regressor using sklearn (original) (raw)

Last Updated : 24 Mar, 2026

Ridge Regression is a technique in machine learning that helps prevent overfitting by adding a regularization term to the linear regression model. Using Scikit-Learn, we can implement Ridge Regression to prevent overfitting in linear models.

Ridge Regression Loss Function

Ridge Regression aims to minimise both the prediction error and the size of the model coefficients, which is expressed by its cost function:

J(w) = \frac{1}{m} \sum_{i=1}^{m} (\hat{y}_i - y_i)^2 + \frac{\alpha}{2} \sum_{j=1}^{n} w_j^2

where:

The first term represents the standard linear regression cost, measuring the mean squared error between predicted and actual values. The second term is the L2 regularization which penalises large coefficients to improve generalisation and prevent overfitting.

How Alpha Controls Regularisation

Ridge Regression is sensitive to the scale of features, so input features should be standardized.

**Choosing the Right \alpha

Selecting an appropriate \alpha is important for balancing bias and variance. Common approaches include:

Step By Step Implementation

Here we implement Ridge Regression on the California housing dataset.

Step 1: Import Required Libraries

Import essential libraries like

import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import RidgeCV from sklearn.metrics import r2_score

`

Step 2: Load and Split Dataset

Fetch the California housing dataset, assign features (X) and target (y) and split into training and test sets for model training and evaluation.

Python `

data = fetch_california_housing() X = data.data y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

`

Step 3: Feature Scaling

Standardize the training and test features using StandardScaler to ensure all variables are on the same scale, which improves regression model performance.

Python `

scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test)

`

Step 4: Train Ridge Regression with Cross-Validation

Initialize RidgeCV with multiple alpha values and 5-fold cross-validation, then fit it on the scaled training data to find the optimal regularization strength.

Python `

ridge_cv = RidgeCV(alphas=[0.1, 1.0, 10.0], cv=5) ridge_cv.fit(X_train_scaled, y_train)

`

**Output:

Screenshot-2026-03-02-164002

Model Trained

Step 5: Make Predictions and Evaluate Model

Predict housing values on the test set and evaluate model performance using the R2 score.

Python `

y_pred = ridge_cv.predict(X_test_scaled)

print("Best alpha selected:", ridge_cv.alpha_) print("Model score (R^2):", r2_score(y_test, y_pred))

`

**Output:

Best alpha selected: 10.0
Model score (R^2): 0.595944060491304

Step 6: Visualize Predictions

Plot the predicted vs actual housing values with a best-fit line to visually assess how well the Ridge Regression model fits the data.

Python `

plt.figure(figsize=(8,6)) plt.scatter(y_test, y_pred, color='blue', alpha=0.5, label='Predicted vs Actual') plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='yellow', linewidth=2, label='Best Fit Line') plt.xlabel("Actual Values") plt.ylabel("Predicted Values") plt.title("Ridge Regression: Actual vs Predicted") plt.legend() plt.show()

`

**Output:

Screenshot-2026-03-02-164324

Best fit line

Download code from here.

Limitations