pandas.core.groupby.DataFrameGroupBy.agg — pandas 2.2.3 documentation (original) (raw)

DataFrameGroupBy.agg(func=None, *args, engine=None, engine_kwargs=None, **kwargs)[source]#

Aggregate using one or more operations over the specified axis.

Parameters:

funcfunction, str, list, dict or None

Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

*args

Positional arguments to pass to func.

enginestr, default None

engine_kwargsdict, default None

**kwargs

Returns:

DataFrame

See also

DataFrame.groupby.apply

Apply function func group-wise and combine the results together.

DataFrame.groupby.transform

Transforms the Series on each group based on the given function.

DataFrame.aggregate

Aggregate using one or more operations over the specified axis.

Notes

When using engine='numba', there will be no “fall back” behavior internally. The group data and group index will be passed as numpy arrays to the JITed user defined function, and no alternative execution attempts will be tried.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methodsfor more details.

Changed in version 1.3.0: The resulting dtype will reflect the return value of the passed func, see the examples below.

Examples

data = {"A": [1, 1, 2, 2], ... "B": [1, 2, 3, 4], ... "C": [0.362838, 0.227877, 1.267767, -0.562860]} df = pd.DataFrame(data) df A B C 0 1 1 0.362838 1 1 2 0.227877 2 2 3 1.267767 3 2 4 -0.562860

The aggregation is for each column.

df.groupby('A').agg('min') B C A 1 1 0.227877 2 3 -0.562860

Multiple aggregations

df.groupby('A').agg(['min', 'max']) B C min max min max A 1 1 2 0.227877 0.362838 2 3 4 -0.562860 1.267767

Select a column for aggregation

df.groupby('A').B.agg(['min', 'max']) min max A 1 1 2 2 3 4

User-defined function for aggregation

df.groupby('A').agg(lambda x: sum(x) + 2) B C A 1 5 2.590715 2 9 2.704907

Different aggregations per column

df.groupby('A').agg({'B': ['min', 'max'], 'C': 'sum'}) B C min max sum A 1 1 2 0.590715 2 3 4 0.704907

To control the output names with different aggregations per column, pandas supports “named aggregation”

df.groupby("A").agg( ... b_min=pd.NamedAgg(column="B", aggfunc="min"), ... c_sum=pd.NamedAgg(column="C", aggfunc="sum") ... ) b_min c_sum A 1 1 0.590715 2 3 0.704907

See Named aggregation for more.

Changed in version 1.3.0: The resulting dtype will reflect the return value of the aggregating function.

df.groupby("A")[["B"]].agg(lambda x: x.astype(float).min()) B A 1 1.0 2 3.0