Inconsistent result with applying function between dataframe/groupby apply and 0.12/0.13/master · Issue #6715 · pandas-dev/pandas (original) (raw)

I stumbled upon an inconsistent behaviour in (groupby/dataframe) apply when updating some code from 0.12 to 0.13.

Say you have following custom function:

def P1(a):
    try:
        return np.percentile(a.dropna(), q=1)
    except:
        return np.nan

def P1_withouttry(a):
    return np.percentile(a.dropna(), q=1)

When you apply this function to a dataframe:

In [3]: df = pd.DataFrame({'col1':[1,2,3,4],'col2':[10,25,26,31],                   
     ...:                    'date':[dt.date(2013,2,10),dt.date(2013,2,10),dt.date(2013,2,11),dt.date(2013,2,11)]})

In [4]: df
Out[4]:
   col1  col2        date
0     1    10  2013-02-10
1     2    25  2013-02-10
2     3    26  2013-02-11
3     4    31  2013-02-11

In [136]: df.apply(P1)
Out[136]: 
col1     1.03
col2    10.45
date      NaN
dtype: float64

In [138]: df.apply(P1_withouttry)
Traceback (most recent call last):
  ...
TypeError: ("unsupported operand type(s) for *: 'datetime.date' and 'float'", u'occurred at index date')

this does work with P1, but not with P1_withouttry. So I constructed my original function with a try/except to be able to apply this on a dataframe with also non-numeric columns.

However, when applying this on a groupby, it does not work anymore like this:

In [6]: g = df.groupby('date')

In [7]: g.apply(P1)
Out[7]:
date
2013-02-10   NaN
2013-02-11   NaN
dtype: float64

In [8]: g.apply(P1_withouttry)
Traceback (most recent call last):
   ...
TypeError: can't compare datetime.date to long


In [8]: g.agg(P1)
Out[8]:
            col1  col2
date
2013-02-10   NaN   NaN
2013-02-11   NaN   NaN

In [143]: g.agg(P1_withouttry)
Out[143]: 
            col1   col2
date                   
2013-02-10  1.01  10.15
2013-02-11  3.01  26.05

So, with apply it does not work, with aggregate it does, but only the P1_withouttry that didn't work with df.apply().
When using g.agg([P1]) this does work again on master, but not with 0.13.1 (then it gives the same as g.agg(P1)), although this did work
in 0.12:

In [11]: g.agg([P1])
Out[11]:
            col1   col2
              P1     P1
date
2013-02-10  1.01  10.15
2013-02-11  3.01  26.05

It was this last pattern I was using in my code in 0.12 that does not work anymore in 0.13.1 (I had something like g.agg([P1, P5, P10, P25, np.median, np.mean])).