pandas.core.groupby.DataFrameGroupBy.agg — pandas 3.0.0.dev0+2095.g2e141aaf99 documentation (original) (raw)

DataFrameGroupBy.agg(func=None, *args, engine=None, engine_kwargs=None, **kwargs)[source]#

Aggregate using one or more operations.

The aggregate function allows the application of one or more aggregation operations on groups of data within a DataFrameGroupBy object. It supports various aggregation methods, including user-defined functions and predefined functions such as ‘sum’, ‘mean’, etc.

Parameters:

funcfunction, str, list, dict or None

Function to use for aggregating the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply.

Accepted combinations are:

*args

Positional arguments to pass to func.

enginestr, default None

engine_kwargsdict, default None

**kwargs

Returns:

DataFrame

Aggregated DataFrame based on the grouping and the applied aggregation functions.

See also

DataFrame.groupby.apply

Apply function func group-wise and combine the results together.

DataFrame.groupby.transform

Transforms the Series on each group based on the given function.

DataFrame.aggregate

Aggregate using one or more operations.

Notes

When using engine='numba', there will be no “fall back” behavior internally. The group data and group index will be passed as numpy arrays to the JITed user defined function, and no alternative execution attempts will be tried.

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methodsfor more details.

Changed in version 1.3.0: The resulting dtype will reflect the return value of the passed func, see the examples below.

Examples

data = { ... "A": [1, 1, 2, 2], ... "B": [1, 2, 3, 4], ... "C": [0.362838, 0.227877, 1.267767, -0.562860], ... } df = pd.DataFrame(data) df A B C 0 1 1 0.362838 1 1 2 0.227877 2 2 3 1.267767 3 2 4 -0.562860

The aggregation is for each column.

df.groupby("A").agg("min") B C A 1 1 0.227877 2 3 -0.562860

Multiple aggregations

df.groupby("A").agg(["min", "max"]) B C min max min max A 1 1 2 0.227877 0.362838 2 3 4 -0.562860 1.267767

Select a column for aggregation

df.groupby("A").B.agg(["min", "max"]) min max A 1 1 2 2 3 4

User-defined function for aggregation

df.groupby("A").agg(lambda x: sum(x) + 2) B C A 1 5 2.590715 2 9 2.704907

Different aggregations per column

df.groupby("A").agg({"B": ["min", "max"], "C": "sum"}) B C min max sum A 1 1 2 0.590715 2 3 4 0.704907

To control the output names with different aggregations per column, pandas supports “named aggregation”

df.groupby("A").agg( ... b_min=pd.NamedAgg(column="B", aggfunc="min"), ... c_sum=pd.NamedAgg(column="C", aggfunc="sum"), ... ) b_min c_sum A 1 1 0.590715 2 3 0.704907

See Named aggregation for more.

Changed in version 1.3.0: The resulting dtype will reflect the return value of the aggregating function.

df.groupby("A")[["B"]].agg(lambda x: x.astype(float).min()) B A 1 1.0 2 3.0