BUG: timeseries.groupby(...).transform('mean') wrong when aggregating over pd.NaT (original) (raw)

Code Sample, a copy-pastable example

The problem is best shown in the following example:

import pandas as pd s = pd.Series(['1 day', '3 days', pd.NaT], dtype='timedelta64[ns]')

print(f'{s.mean()=}\n')

as expected: result is '2 days'

print(f"{s.apply('mean')=}\n")

as expected: result is '2 days'

print("transform('mean')") print(s.groupby([1,1,1]).transform('mean'))

whoops: (a series of 3x) -35583 days +08:04:14.381741568

expectation was: (a series of 3x) '2 days'

Fixing the problem

print("\ntransform(pd.Series.mean)") print(s.groupby([1,1,1]).transform(pd.Series.mean))

(a series of 3x) '2 days', sanity restored :-)

Problem description

When passing a function name by string to transform, pd.NaT is not handled correctly. This not only happens for 'mean' but also for other functions like 'sum'.

I don't know how transform is looking up the function if a string is passed (and it's not really documented), but it certainly doesn't select pd.Series.mean but some other non-NaT-aware mean function.

This is surprising since NaT is handled correctly when calling apply('mean').

Tested with pd 1.3.2.