Calculating RMSE Using Scikitlearn (original) (raw)

Calculating RMSE Using Scikit-learn

Last Updated : 17 Sep, 2025

Root Mean Square Error is a metrics used for evaluating the accuracy of regression models. It measures the average size of the errors between predicted and actual values by taking the square root of the mean of squared differences. RMSE helps determine how close the model’s predictions are to real outcomes with lower values indicating better prediction accuracy.

The formula for RMSE is:

\text{RMSE} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (\hat{y}_i - y_i)^2}

Here,

Implementing RMSE Using Scikit-learn

We will use the California Housing dataset (an in-built dataset in Scikit-learn) to predict house prices using Linear Regression and then calculate the Root Mean Square Error (RMSE).

1. Import Required Libraries

We will import numpy, pandas and scikit learn for this.

Python `

import pandas as pd from sklearn.datasets import fetch_california_housing from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error import numpy as np

`

2. Load the Dataset

Here we will load califonia housing dataset from scikit learn:

data = fetch_california_housing(as_frame=True)

df = data.frame

X = df.drop('MedHouseVal', axis=1) y = df['MedHouseVal']

`

3. Split the Dataset

Here we will split data into 80% training and 20% testing data.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

`

4. Train the Linear Regression Model

Here we will train a linear regression model:

model = LinearRegression() model.fit(X_train, y_train)

`

**Output:

Screenshot-2025-09-17-113921

Model Training

5. Make Predictions and Calculate RMSE

y_pred = model.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))

print("Root Mean Square Error (RMSE):", rmse)

`

**Output:

Root Mean Square Error (RMSE): 0.7455813830127763

A lower RMSE indicates that the model’s predictions are closer to the real values. In this case, an RMSE of 0.74 shows that the Linear Regression model is performing reasonably well on this dataset.

Advantages of RMSE

RMSE is preferred over other metrics like Mean Absolute Error (MAE) because it penalizes larger errors more significantly. This makes it sensitive to outliers, which can be beneficial when large errors are particularly undesirable. It helps in: