How to Create Categorical Variables in R? (original) (raw)

Last Updated : 19 Dec, 2021

In this article, we will learn how to create categorical variables in the R Programming language.

In statistics, variables can be divided into two categories, i.e., categorical variables and quantitative variables. The variables which consist of numerical quantifiable values are known as quantitative variables and a categorical variable is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property.

Method 1: Categorical Variable from Scratch

To create a categorical variable from scratch i.e. by giving manual value for each row of data, we use the factor() function and pass the data column that is to be converted into a categorical variable. This factor() function converts the quantitative variable into a categorical variable by grouping the same values together.

Syntax:

df$categorical_variable <- factor( categorical_vector )

where

Example:

Here, is a basic data frame where a new column group is added as a categorical variable.

R `

create sample data frame

df <- data.frame(x=c(10, 23, 13, 41, 15), y=c(71, 17, 28, 32, 12))

create categorical vector

group_vector <- c('A','B','C','D','E')

Add categorical variable to the data frame

df$group <- factor(group_vector)

print data frame

df

`

Output:

x y group 1 10 71 A 2 23 17 B 3 13 28 C 4 41 32 D 5 15 12 E

Method 2: Categorical Variable from the Existing column using two values

To create a categorical variable from the existing column, we use an if-else statement within the factor() function and give a value to a column if a certain condition is true otherwise give another value.

Syntax:

df$categorical_variable <- as.factor( ifelse(condition, val1, val2) )

where

Example:

Here, is a basic data frame where a new column group is added as a categorical variable from an if-else condition.

R `

create sample data frame

df <- data.frame(x=c(10, 23, 13, 41, 15), y=c(71, 17, 28, 32, 12))

Add categorical variable to the data frame

df$group <- as.factor(ifelse(df$x >20, 'A', 'B'))

print data frame

df

`

Output:

x y group 1 10 71 B 2 23 17 A 3 13 28 B 4 41 32 A 5 15 12 B

Method 3: Categorical Variable from the Existing column using multiple values

To create a categorical variable from the existing column, we use multiple if-else statements within the factor() function and give a value to a column if a certain condition is true, if none of the conditions are true we use the else value of the last statement.

Syntax:

df$categorical_variable <- as.factor( ifelse(condition, val,ifelse(condition, val,ifelse(condition, val, ifelse(condition, val, vale_else)))))

where

Example:

Here, is a basic data frame where a new column group is added as a categorical variable from multiple if-else conditions.

R `

create sample data frame

df <- data.frame(x=c(10, 23, 13, 41, 15, 11, 23, 45, 95, 23, 75), y=c(71, 17, 28, 32, 12, 13, 41, 15, 11, 23, 34))

Add categorical variable to the data frame

df$group <- as.factor(ifelse(df$x<20, 'A', ifelse(df$x<30, 'B', ifelse(df$x<50, 'C', ifelse(df$x<90, 'D', 'E')))))

print data frame

df

`

Output:

x  y group

1 10 71 A 2 23 17 B 3 13 28 A 4 41 32 C 5 15 12 A 6 11 13 A 7 23 41 B 8 45 15 C 9 95 11 E 10 23 23 B 11 75 34 D