Stratified Sampling in R (original) (raw)

Last Updated : 25 Jul, 2025

Stratified sampling involves splitting a population into different groups based on a common characteristic and then randomly selecting members from each group. This method is useful when we want to ensure that each subgroup is represented in the sample. In this article , we will explore how to implement stratified sampling using R programming language.

Implementation of Stratified Sampling Using Number of Rows

We divide the population into groups and select a fixed number of members from each group to form the final sample.

1. Installing and Loading Required Packages

We install and load the dplyr package to manipulate data and perform group-wise sampling.

install.packages("dplyr") library(dplyr)

`

2. Creating the Data Frame

We create a data frame containing 600 entries, with equal numbers of Teachers, Students, Workforce and Guests, each having a randomly generated GPA.

df <- data.frame(group = rep(c("Teachers", "Students", "Workforce", "Guests"), each = 150), gpa = rnorm(600, mean = 90, sd = 3))

`

3. Obtaining Stratified Sample

We group the data by category and select 15 samples from each group using random sampling.

strat_sample <- df %>% group_by(group) %>% sample_n(size = 15)

`

4. Finding Frequency of Groups in the Sample

We check how many records are selected from each group in the final sample.

table(strat_sample$group)

`

**Output:

table

Output

Implementation of Stratified Sampling Using Fraction of Rows

We divide the population into groups and select a specific fraction of members from each group to form the final sample.

1. Installing and Loading Required Packages

We install and load the dplyr package to enable data manipulation and sampling functions.

install.packages("dplyr") library(dplyr)

`

2. Creating the Data Frame

We create the same data frame with 600 rows and four groups, each having 150 entries and a GPA score.

df <- data.frame(group = rep(c("Teachers", "Students", "Workforce", "Guests"), each = 150), gpa = rnorm(600, mean = 90, sd = 3))

`

3. Obtaining Stratified Sample

We use the group-wise sampling function to select 20 percent of data from each group.

strat_sample <- df %>% group_by(group) %>% sample_frac(size = 0.20)

`

4. Finding Frequency of Groups in the Sample

We check how many records were selected from each group after applying fraction-based sampling.

table(strat_sample$group)

`

**Output:

table

Output

We implemented stratified sampling in R programming language using two methods, fixed number of rows and fraction of rows.