Random Forest Approach in R Programming (original) (raw)

Last Updated : 27 Jun, 2025

Random Forest is an machine learning algorithm which is used for both regression and classification tasks. It is an ensemble method that creates multiple decision trees and combines their outputs to improve model performance.

Key points about Random Forest

The result is concluded either by:

Working of Random Forest Algorithm

The Random Forest algorithm operates on two key principles:

  1. **Bootstrap Sampling: Random subsets of the training data are created by sampling with replacement.
  2. **Tree Construction: For each subset, a decision tree is constructed, considering only a random subset of features for splitting at each node.
  3. **Aggregation of Predictions: Once the trees are built, their predictions are aggregated using majority voting (for classification) or averaging (for regression).

This combination of decision trees helps to reduce overfitting and improves model accuracy.

Example

Consider a Fruit Box consisting of three fruits Apples, Oranges, and Cherries in training data (n = 3). We are predicting the fruit which is maximum in number in a fruit box. A random forest model using the training data with a number of trees, k = 3.

tree_1

Working of Random Forest

The model is judged using various features of data like diameter, color, shape and groups. Among orange, cheery, and orange, orange is selected to be maximum in fruit box by random forest.

Implementing Random Forest in R

We will now implement a Random Forest model using the famous iris dataset. This will help us understand how to build and evaluate a Random Forest model in R.

1. Installing the Required Package

To implement Random Forest in R, we first need to install the **randomForest package. This package provides a simple interface for training and evaluating Random Forest models.

install.packages("randomForest")

library(randomForest)

`

2. Loading the Dataset

We will use the iris dataset which contains data on three species of iris flowers, with measurements of sepal length, sepal width, petal length, and petal width. It is an in-built dataset in R. We will then display some of its first few rows using the **head() function.

R `

data(iris) head(iris)

`

**Output:

data

Iris Dataset

3. Splitting the Data

We will now split the data into training and testing sets. The **sample() function is used to split the data into training and testing sets.

R `

set.seed(42)

trainIndex <- sample(1:nrow(iris), 0.8 * nrow(iris))

trainData <- iris[trainIndex, ] testData <- iris[-trainIndex, ]

`

4. Training the Random Forest Model

Now, we will build the Random Forest model using the training data. The **randomForest() function creates a Random Forest model. It takes the formula target ~ features and a data frame as input. In this case, the model based on the **Species as the target variable and all other variables as features.

R `

rf_model <- randomForest(Species ~ ., data = trainData) print(rf_model)

`

**Output:

rf

RandomForest Model

5. Evaluating the Model

We can evaluate the model’s performance by making predictions on the test data and comparing them to the true values.

The confusion matrix will show how well the model performed, providing metrics such as accuracy, precision, recall and F1 score. We will use **caret library to plot the confusion matrix for our model.

R `

install.packages("caret") library(caret)

predictions <- predict(rf_model, testData) confusionMatrix(predictions, testData$Species)

`

**Output:

evaluation

confusion matrix

Hyperparameter Tuning

The performance of Random Forest can be improved by tuning hyperparameters. We will now explore how to tune the key hyperparameters of the Random Forest model to improve its performance.

Key Hyperparameters:

**Example:

In this example, we’ve set the number of trees (**ntree) to 500 and used 2 features (**mtry) at each split.

R `

rf_tuned <- randomForest(Species ~ ., data = trainData, ntree = 500, mtry = 2) print(rf_tuned)

`

**Output:

hyperrf

Hyperparameter Tuning

Interpreting Results and Model Evaluation

We will now interpret the results of the Random Forest model. One useful feature of Random Forest is feature importance, which shows how important each feature is in predicting the target variable.

Key functions:

This helps in identifying which features contribute the most to the model’s decision-making process.

R `

importance(rf_model) varImpPlot(rf_model)

`

**Output:

importance

importance

plotofimportance

varImpPlot

Advantages of Random Forest

In this article, we explored the Random Forest and learned how it works by constructing multiple decision trees and aggregating their predictions to enhance accuracy.