How to combine Groupby and Multiple Aggregate Functions in Pandas? (original) (raw)

Last Updated : 04 Apr, 2025

Pandas is an open-source Python library built on top of **NumPy. It allows data structures and functions to manipulate and analyze numerical data and time series efficiently. It is widely used in data analysis for tasks like data manipulation, cleaning and exploration. One of its key feature is to **group data and apply **multiple aggregate functions for more comprehensive and insightful analyses.

1. Groupby()

dataframe.groupby() function in **Pandas allows you to group the data in a DataFrame based on a particular column or multiple columns. This function splits the data into subsets, apply the function to each subset independently and then combines the results back into a DataFrame or Series. This is commonly used in **data aggregation, **summarizing and **transforming operations.

For example, we will group drinking data by continent and for this we will import the dataset and group the data by the continent column.

import pandas as pd df = pd.read_csv("https://bit.ly/drinksbycountry") grouped_data = df.groupby('continent')[['beer_servings', 'spirit_servings', 'wine_servings', 'total_litres_of_pure_alcohol']].mean() print(grouped_data)

`

**Output:

2. Aggregate()

dataframe.agg() function in Pandas allows you to apply multiple aggregation functions at once. It’s useful when you want to calculate different statistics like sum, mean, min, max, etc for a particular column. This method provides flexibility by allowing you to perform multiple operations on the same column or different columns in a single line of code, streamlining the analysis process.

For example, let’s now explore Aggregation of Beer Servings to calculate multiple statistics, such as the sum, minimum and maximum values.

beer_aggregates = df['beer_servings'].agg(['sum', 'min', 'max'])

print(beer_aggregates)

`

**Output:

Combining groupby() with Multiple Aggregation Functions

By combining the groupby() function with agg() you can calculate multiple aggregations for different columns grouped by a particular column. This allows for more granular and insightful data analysis.

Let’s now combine groupby() with agg() to calculate multiple statistics for the beerservings, spiritservings and wineservings columns, grouped by the continent column.

aggregated_data = df.groupby('continent').agg( min=('beer_servings', 'min'), max=('beer_servings', 'max'), sum=('beer_servings', 'sum'), count=('beer_servings', 'count'), mean=('beer_servings', 'mean') )

print(aggregated_data)

`

**Output:

groupbypic41

Groupby() with Multiple Aggregate Function

Here we can see we get a detailed analysis of data by applying different aggregation functions like sum, mean, count, min and max. It helps in doing complex analysis and making informed decision which was not possible earlier.