Random Forest Approach for Classification in R Programming (original) (raw)

Last Updated : 15 Jul, 2025

Random Forest is a machine learning algorithm used for classification and regression tasks. It creates multiple decision trees and combines their outputs to improve accuracy and minimize overfitting. Each tree makes an individual prediction and the final result is determined by aggregating the predictions from all trees. This method increases the model's reliability, robustness and performance compared to single decision trees. Random Forest is commonly used for its ability to handle large datasets, capture complex relationships and deliver accurate results with minimal tuning.

Key Features of Random Forest

Implementing Random Forest Approach for Classification in R

We will implement the Random Forest approach for classification in R programming. We classify the species of iris plants based on various features using the Random Forest approach in R.

1. Installing and Loading Required Libraries

We will first need to install the randomForest package. We can do this with the following command:

R `

install.packages("randomForest") install.packages("ggplot2")

library(randomForest) library(ggplot2)

`

2. Loading the Dataset

We will use the iris dataset which is a built-in dataset in R and display first few rows.

data(iris) head(iris)

`

**Output:

data

Iris Dataset

3. Splitting the Dataset into Training and Testing Sets

We need to split the dataset into two parts, training and testing. We will use 80% of the data for training and the remaining 20% for testing.

set.seed(42)

trainIndex <- sample(1:nrow(iris), 0.8 * nrow(iris)) trainData <- iris[trainIndex, ] testData <- iris[-trainIndex, ]

`

4. Defining the Model

We will apply the **randomForest() function to classify the species of iris plants.

iris.rf <- randomForest(Species ~ ., data = trainData, importance = TRUE, proximity = TRUE)

`

4. Printing the Classification Model

After creating the Random Forest model, we will print the summary of the model to view its details, including the error rate and confusion matrix.

R `

print(iris.rf)

`

**Output:

rf

Classification Model

We can observe that

**Confusion Matrix: The confusion matrix shows the number of correct and incorrect classifications for each species.

5. Making Predictions on the Test Set

Once the model is trained, we use it to make predictions on the test data.

predictions <- predict(iris.rf, newdata = testData)

`

6. Plotting the Confusion Matrix (Actual vs Predicted Values)

We can evaluate the performance of our model by comparing the actual and predicted values using a confusion matrix. We create a confusion matrix using the table()function which compares the predicted values (predictions) with the actual values (testData$Species).

confMatrix <- table(Predicted = predictions, Actual = testData$Species)

confMatrixDF <- as.data.frame(confMatrix) colnames(confMatrixDF) <- c("Predicted", "Actual", "Count")

ggplot(data = confMatrixDF, aes(x = Actual, y = Predicted, fill = Count)) + geom_tile() + geom_text(aes(label = Count), color = "white", size = 5) + scale_fill_gradient(low = "white", high = "blue") + theme_minimal() + labs(title = "Confusion Matrix", x = "Actual", y = "Predicted") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

`

**Output:

cm

Confusion Matrix

The confusion matrix indicates the following:

7. Plotting the Error vs Number of Trees Graph

To visualize how the error changes as we increase the number of trees, we can plot a graph showing the relationship between the error rate and the number of trees.

plot(iris.rf)

`

**Output:

error

Error vs Number of Trees

As the number of trees increases, the error rate will generally decrease and stabilize. The graph helps determine the optimal number of trees to prevent overfitting while ensuring good model performance.