Polynomial Regression in R Programming (original) (raw)

Last Updated : 15 Jul, 2025

Polynomial Regression is an extension of linear regression where the relationship between the dependent variable (y) and the independent variable (x) is modeled as an nth degree polynomial.

Equation:

y = \beta_0 + \beta_1 x + \beta_2 x^2 + \ldots + \beta_n x^n + \varepsilon

Why Polynomial Regression is Needed

Linear regression assumes a straight-line relationship, but fails to capture underlying trends when the data follows a non-linear pattern.

Implementing Polynomial Regression in R

We can implement Polynomial Regression in R by following a series of steps to prepare the data, build the model and evaluate its performance.

1. Installing Required Packages

We install the tidyverse and caret packages for data manipulation, visualization and machine learning tasks.

install.packages("tidyverse") install.packages("caret") library(tidyverse) library(caret)

`

2. Loading the Dataset

We load the Boston housing dataset from the MASS package.

library(MASS) data("Boston")

`

3. Splitting the Data

We split the data into training and test sets using createDataPartition() from the caret package.

set.seed(123) trainIndex <- createDataPartition(Boston$medv, p = 0.8, list = FALSE) train.data <- Boston[trainIndex, ] test.data <- Boston[-trainIndex, ]

`

4. Building the Polynomial Regression Model

We build a polynomial regression model with degree 2 and 5 using lm().

model2 <- lm(medv ~ lstat + I(lstat^2), data = train.data) model5 <- lm(medv ~ poly(lstat, 5, raw = TRUE), data = train.data)

`

5. Making Predictions

We make predictions on the test data using the predict() function.

pred2 <- predict(model2, test.data) pred5 <- predict(model5, test.data)

`

6. Evaluating Model Performance

We evaluate model accuracy using RMSE and R² with the postResample() function.

postResample(pred2, test.data$medv) postResample(pred5, test.data$medv)

`

**Output:

data

Output

7. Visualizing the Polynomial Fit

We use ggplot2 to plot the data and overlay the polynomial regression curve.

ggplot(train.data, aes(lstat, medv)) + geom_point() + stat_smooth(method = lm, formula = y ~ poly(x, 5, raw = TRUE))

`

**Output:

polynomial_regression

Output

The graph shows a scatterplot of medv vs. lstat with a 5-degree polynomial regression curve overlaid using stat_smooth(). It visually demonstrates how well the model captures the non-linear relationship in the data.

Applications of Polynomial Regression

Polynomial regression is commonly applied in fields where relationships between variables are inherently non-linear, such as: