Random Forest Approach for Regression in R Programming (original) (raw)

Last Updated : 10 Dec, 2025

Random Forest is a supervised learning algorithm and an ensemble learning model that combines multiple decision trees to improve accuracy and reduce overfitting. By averaging the predictions of several trees, it provides more stable and robust results for both classification and regression tasks. This approach enhances the model’s ability to generalize well to unseen data.

Key Features of Random Forest

Implementation of Random Forest for Regression in R

We will train a model using the **airquality dataset in R and perform predictions on the Ozone levels based on the other features (like Solar Radiation, Wind speed and Temperature). We will also visualize the results.

1. Installing and Loading the Required Packages

We first need to install and load the **randomForest package.

R `

install.packages("randomForest")

library(randomForest)

`

2. Exploring the Dataset

We will use the **airquality dataset which contains measurements related to air quality. It includes columns like Ozone, Solar Radiation, Wind speed, Temperature, Month and Day. We will use these features to predict the Ozone levels.

R `

data("airquality") head(airquality)

`

**Output:

data

Dataset

3. Handling Missing Data

The airquality dataset has missing values in some columns. We’ll remove rows with missing values to ensure the model works correctly.

R `

airquality_clean <- na.omit(airquality)

`

4. Creating the Random Forest Model

Now we will create the Random Forest regression model to predict Ozone based on the other variables.

ozone.rf <- randomForest(Ozone ~ ., data = airquality_clean, mtry = 3, importance = TRUE)

`

5. Printing Model Results

Let’s inspect the output of the model to understand how well it performed.

R `

print(ozone.rf)

`

**Output:

rfmodel

Random Forest Model

6. Making Predictions

We will use the trained model to predict Ozone levels based on the features of the **airquality_clean dataset.

R `

ozone_predictions <- predict(ozone.rf, airquality_clean) op <- as.data.frame(ozone_predictions)

head(op)

`

**Output:

predictions

Making Predictions

7. Plotting Actual vs Predicted Values

We’ll create a plot to compare the actual Ozone values with the predicted values from the Random Forest model.

R `

plot(airquality_clean$Ozone, ozone_predictions, main = "Actual vs Predicted Ozone Levels", xlab = "Actual Ozone", ylab = "Predicted Ozone", col = "blue", pch = 19)

abline(0, 1, col = "red", lwd = 2)

`

**Output:

acvspred

Actual vs Predicted Value

8. Calculating Feature Importance

We can also visualize the importance of each feature in predicting Ozone levels using the importance() function and **varImpPlot() function. The plot will show which features (e.g. Solar.R, Wind, Temp) are most influential in predicting the Ozone levels.

R `

importance(ozone.rf)

varImpPlot(ozone.rf)

`

**Output:

impor

importance()

varimport

varImpPlot()

9. Plotting Error vs Number of Trees

We can also visualize how the error rate changes with the number of trees. This helps us understand the stability of the model as it learns more from the data.

R `

plot(ozone.rf)

`

Output:

treevserror

Error vs Number of Trees

The plot shows how the model’s error decreases as the number of trees increases, indicating that the model improves with more trees.

Advantages and Disadvantages of Random Forest

Advantages:

Disadvantages: