pandas.DataFrame.apply — pandas 3.0.0.dev0+2177.g8a1d5a06f9 documentation (original) (raw)

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine=None, engine_kwargs=None, **kwargs)[source]#

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument. The return type of the applied function is inferred based on the first computed result obtained after applying the function to a Series object.

Parameters:

funcfunction

Function to apply to each column or row.

axis{0 or ‘index’, 1 or ‘columns’}, default 0

Axis along which the function is applied:

0 or ‘index’: apply function to each column.
1 or ‘columns’: apply function to each row.

rawbool, default False

Determines if row or column is passed as a Series or ndarray object:

False : passes each row or column as a Series to the function.
True : the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance.

result_type{‘expand’, ‘reduce’, ‘broadcast’, None}, default None

These only act when axis=1 (columns):

‘expand’ : list-like results will be turned into columns.
‘reduce’ : returns a Series if possible rather than expanding list-like results. This is the opposite of ‘expand’.
‘broadcast’ : results will be broadcast to the original shape of the DataFrame, the original index and columns will be retained.

The default behaviour (None) depends on the return value of the applied function: list-like results will be returned as a Series of those. However if the apply function returns a Series these are expanded to columns.

argstuple

Positional arguments to pass to func in addition to the array/series.

by_rowFalse or “compat”, default “compat”

Only has an effect when func is a listlike or dictlike of funcs and the func isn’t a string. If “compat”, will if possible first translate the func into pandas methods (e.g. Series().apply(np.sum) will be translated toSeries().sum()). If that doesn’t work, will try call to apply again withby_row=True and if that fails, will call apply again withby_row=False (backward compatible). If False, the funcs will be passed the whole Series at once.

Added in version 2.1.0.

enginedecorator or {‘python’, ‘numba’}, optional

Choose the execution engine to use. If not provided the function will be executed by the regular Python interpreter.

Other options include JIT compilers such Numba and Bodo, which in some cases can speed up the execution. To use an executor you can provide the decorators numba.jit, numba.njit or bodo.jit. You can also provide the decorator with parameters, like numba.jit(nogit=True).

Not all functions can be executed with all execution engines. In general, JIT compilers will require type stability in the function (no variable should change data type during the execution). And not all pandas and NumPy APIs are supported. Check the engine documentation [1] and [2]for limitations.

Warning

String parameters will stop being supported in a future pandas version.

Added in version 2.2.0.

engine_kwargsdict

Pass keyword arguments to the engine. This is currently only used by the numba engine, see the documentation for the engine argument for more information.

**kwargs

Additional keyword arguments to pass as keywords arguments tofunc.

Returns:

Series or DataFrame

Result of applying func along the given axis of the DataFrame.

Notes

Functions that mutate the passed object can produce unexpected behavior or errors and are not supported. See Mutating with User Defined Function (UDF) methodsfor more details.

References

[2]

Bodo documentation https://docs.bodo.ai/latest//

Examples

df = pd.DataFrame([[4, 9]] * 3, columns=["A", "B"]) df A B 0 4 9 1 4 9 2 4 9

Using a numpy universal function (in this case the same asnp.sqrt(df)):

df.apply(np.sqrt) A B 0 2.0 3.0 1 2.0 3.0 2 2.0 3.0

Using a reducing function on either axis

df.apply(np.sum, axis=0) A 12 B 27 dtype: int64

df.apply(np.sum, axis=1) 0 13 1 13 2 13 dtype: int64

Returning a list-like will result in a Series

df.apply(lambda x: [1, 2], axis=1) 0 [1, 2] 1 [1, 2] 2 [1, 2] dtype: object

Passing result_type='expand' will expand list-like results to columns of a Dataframe

df.apply(lambda x: [1, 2], axis=1, result_type="expand") 0 1 0 1 2 1 1 2 2 1 2

Returning a Series inside the function is similar to passingresult_type='expand'. The resulting column names will be the Series index.

df.apply(lambda x: pd.Series([1, 2], index=["foo", "bar"]), axis=1) foo bar 0 1 2 1 1 2 2 1 2

Passing result_type='broadcast' will ensure the same shape result, whether list-like or scalar is returned by the function, and broadcast it along the axis. The resulting column names will be the originals.

df.apply(lambda x: [1, 2], axis=1, result_type="broadcast") A B 0 1 2 1 1 2 2 1 2

Advanced users can speed up their code by using a Just-in-time (JIT) compiler with apply. The main JIT compilers available for pandas are Numba and Bodo. In general, JIT compilation is only possible when the function passed toapply has type stability (variables in the function do not change their type during the execution).

import bodo df.apply(lambda x: x.A + x.B, axis=1, engine=bodo.jit)

Note that JIT compilation is only recommended for functions that take a significant amount of time to run. Fast functions are unlikely to run faster with JIT compilation.