Summarise multiple columns using dplyr in R (original) (raw)

Last Updated : 24 Oct, 2021

In this article, we will discuss how to summarise multiple columns using dplyr package in R Programming Language,

Method 1: Using summarise_all() method

The summarise_all method in R is used to affect every column of the data frame. The output data frame returns all the columns of the data frame where the specified function is applied over every column.

summarise_all(data, function)

Arguments :

data - The data frame to summarise the columns of
function - The function to apply on all the data frame columns. R `

library("dplyr")

creating a data frame

df < - data.frame(col1=sample(rep(c(1: 5), each=3)), col2=5: 19)

print("original dataframe") print(df)

summarising the data

print("summarised dataframe") summarise_all(df, mean)

Output

[1] "original dataframe" col1 col2 1 2 1 2 3 2 3 4 3 4 2 4 5 2 5 6 4 6 7 1 7 8 1 8 9 5 9 10 3 10 11 5 11 12 1 12 13 4 13 14 5 14 15 3 15
col1 col2 1 3 8

Explanation: The mean of all the values is calculated column-wise, that is, the sum of values of col1 is calculated and divided by the number of rows. Similarly, the summation of values is computed for col2 and col3. All the columns are returned in the final output.

Method 2: Using summarise_at() method

The summarise_at() affects variables that are extracted with a character vector or vars(). It applies the selected function to the data frame. The output data frame contains all the columns that are specified in the summarise_at method. In case all the columns of the data frame are mentioned, then the functionality of this method is the same as the summarise_all method.

data %>% summarise_at(vars(-cols(), ...), function)

Arguments :

data - The data frame to summarise the columns of
function - The function to apply on all the data frame columns. R `

library("dplyr")

creating a data frame

df < - data.frame(col1=sample(rep(c(1: 5), each=3)), col2=1: 15, col3=letters[1:15])

print("original dataframe") print(df)

summarising the data

print("summarised dataframe") df % >% summarise_at(c("col1", "col2"), mean, na.rm=TRUE)

Output

[1] "original dataframe" col1 col2 col3 1 3 1 a 2 5 2 b 3 4 3 c 4 4 4 d 5 5 5 e 6 3 6 f 7 2 7 g 8 2 8 h 9 1 9 i 10 4 10 j 11 2 11 k 12 5 12 l 13 1 13 m 14 3 14 n 15 1 15 o [1] "summarised dataframe" col1 col2 1 3 8