Feature Engineering in R Programming (original) (raw)

Last Updated : 13 Dec, 2025

Feature Engineering in R means creating new features or modifying existing ones to make models work better. It includes cleaning, transforming, scaling, encoding and selecting features for machine learning.

In R, this is done using packages like dplyr, tidyr, caret and data.table.

Sample Dataset

R `

df <- data.frame( age = c(23, 45, 35, 62, 18), income = c(30000, 60000, 45000, 80000, 20000), gender = c("Male", "Female", "Female", "Male", "Male"), city = c("A", "B", "A", "C", "B") ) df

`

**Output:

Dataframe

Sample Dataset

This dataset has:

We will use this small data to explain each concept.

1. Handling Missing Values

The dataset contains a missing value in income.

**Example (add NA for explanation):

R `

df$income[is.na(df$income)] <- mean(df$income, na.rm = TRUE) df

`

**Output:

Dataset

Dataset After Handling Missing Values

**Explanation:

2. Encoding Categorical Variables

**Label Encoding (for binary categories: gender)

R `

df$gender_num <- ifelse(df$gender == "Male", 1, 0) df

`

**Output:

Dataset

Dataset After Label Encoding

**Explanation:

**One-Hot Encoding (for multi-class: city)

R `

ohe <- model.matrix(~ city - 1, data = df) df <- cbind(df, ohe) df

`

**Output:

Dataset

Dataset After One hot encoding

**Explanation:

City A, B and C become separate columns:

Each gets 0/1 depending on membership.

3. Feature Scaling

Scaling helps numeric values stay on similar ranges.

**Using standard scaling (mean = 0, sd = 1)

R `

df$age_scaled <- scale(df$age) df$income_scaled <- scale(df$income) df

`

**Output:

Dataset

Dataset after Using standard scaling

**Explanation:

4. Binning (Feature Transformation)

Create age groups:

R `

df$age_group <- cut( df$age, breaks = c(0, 25, 50, 100), labels = c("Young", "Middle", "Senior") ) df

`

**Output:

Dataset

Dataset after Feature Transformation

**Explanation:

5. Feature Construction

**Create a new feature: income per year of age

R `

df$income_per_age <- df$income / df$age df

`

**Output:

Dataset

Dataset after Feature Construction

6. Removing Skewness

Apply log transformation to reduce skew in income:

R `

df$income_log <- log(df$income + 1) df

`

**Output:

Dataset

Dataset after Removing Skewness

**Explanation:

7. Final Cleaned Feature-Enhanced Dataset

After all steps, the dataset now looks like this:

This feature rich dataset is now ready for modeling.