pandas.DataFrame.transform — pandas 2.2.3 documentation (original) (raw)

DataFrame.transform(func, axis=0, *args, **kwargs)[source]#

Call func on self producing a DataFrame with the same axis shape as self.

Parameters:

funcfunction, str, list-like or dict-like

Function to use for transforming the data. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. If func is both list-like and dict-like, dict-like behavior takes precedence.

Accepted combinations are:

axis{0 or ‘index’, 1 or ‘columns’}, default 0

If 0 or ‘index’: apply function to each column. If 1 or ‘columns’: apply function to each row.

*args

Positional arguments to pass to func.

**kwargs

Keyword arguments to pass to func.

Returns:

DataFrame

A DataFrame that must have the same length as self.

Raises:

ValueErrorIf the returned DataFrame has a different length than self.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methodsfor more details.

Examples

df = pd.DataFrame({'A': range(3), 'B': range(1, 4)}) df A B 0 0 1 1 1 2 2 2 3 df.transform(lambda x: x + 1) A B 0 1 2 1 2 3 2 3 4

Even though the resulting DataFrame must have the same length as the input DataFrame, it is possible to provide several input functions:

s = pd.Series(range(3)) s 0 0 1 1 2 2 dtype: int64 s.transform([np.sqrt, np.exp]) sqrt exp 0 0.000000 1.000000 1 1.000000 2.718282 2 1.414214 7.389056

You can call transform on a GroupBy object:

df = pd.DataFrame({ ... "Date": [ ... "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05", ... "2015-05-08", "2015-05-07", "2015-05-06", "2015-05-05"], ... "Data": [5, 8, 6, 1, 50, 100, 60, 120], ... }) df Date Data 0 2015-05-08 5 1 2015-05-07 8 2 2015-05-06 6 3 2015-05-05 1 4 2015-05-08 50 5 2015-05-07 100 6 2015-05-06 60 7 2015-05-05 120 df.groupby('Date')['Data'].transform('sum') 0 55 1 108 2 66 3 121 4 55 5 108 6 66 7 121 Name: Data, dtype: int64

df = pd.DataFrame({ ... "c": [1, 1, 1, 2, 2, 2, 2], ... "type": ["m", "n", "o", "m", "m", "n", "n"] ... }) df c type 0 1 m 1 1 n 2 1 o 3 2 m 4 2 m 5 2 n 6 2 n df['size'] = df.groupby('c')['type'].transform(len) df c type size 0 1 m 3 1 1 n 3 2 1 o 3 3 2 m 4 4 2 m 4 5 2 n 4 6 2 n 4