Inconsistent result with applying function between dataframe/groupby apply and 0.12/0.13/master · Issue #6715 · pandas-dev/pandas (original) (raw)
I stumbled upon an inconsistent behaviour in (groupby/dataframe) apply when updating some code from 0.12 to 0.13.
Say you have following custom function:
def P1(a):
try:
return np.percentile(a.dropna(), q=1)
except:
return np.nan
def P1_withouttry(a):
return np.percentile(a.dropna(), q=1)
When you apply this function to a dataframe:
In [3]: df = pd.DataFrame({'col1':[1,2,3,4],'col2':[10,25,26,31],
...: 'date':[dt.date(2013,2,10),dt.date(2013,2,10),dt.date(2013,2,11),dt.date(2013,2,11)]})
In [4]: df
Out[4]:
col1 col2 date
0 1 10 2013-02-10
1 2 25 2013-02-10
2 3 26 2013-02-11
3 4 31 2013-02-11
In [136]: df.apply(P1)
Out[136]:
col1 1.03
col2 10.45
date NaN
dtype: float64
In [138]: df.apply(P1_withouttry)
Traceback (most recent call last):
...
TypeError: ("unsupported operand type(s) for *: 'datetime.date' and 'float'", u'occurred at index date')
this does work with P1
, but not with P1_withouttry
. So I constructed my original function with a try/except to be able to apply this on a dataframe with also non-numeric columns.
However, when applying this on a groupby, it does not work anymore like this:
In [6]: g = df.groupby('date')
In [7]: g.apply(P1)
Out[7]:
date
2013-02-10 NaN
2013-02-11 NaN
dtype: float64
In [8]: g.apply(P1_withouttry)
Traceback (most recent call last):
...
TypeError: can't compare datetime.date to long
In [8]: g.agg(P1)
Out[8]:
col1 col2
date
2013-02-10 NaN NaN
2013-02-11 NaN NaN
In [143]: g.agg(P1_withouttry)
Out[143]:
col1 col2
date
2013-02-10 1.01 10.15
2013-02-11 3.01 26.05
So, with apply
it does not work, with aggregate
it does, but only the P1_withouttry
that didn't work with df.apply()
.
When using g.agg([P1]
) this does work again on master, but not with 0.13.1 (then it gives the same as g.agg(P1)
), although this did work
in 0.12:
In [11]: g.agg([P1])
Out[11]:
col1 col2
P1 P1
date
2013-02-10 1.01 10.15
2013-02-11 3.01 26.05
It was this last pattern I was using in my code in 0.12 that does not work anymore in 0.13.1 (I had something like g.agg([P1, P5, P10, P25, np.median, np.mean])
).