REGR: cumsum regression with groupby call to agg (original) (raw)

Code Sample, a copy-pastable example if possible

I want to define a custom function that I can pass to the agg method. It uses the cumsum method, which appears to be problematic recently.

Your code here

import pandas as pd

def max_test(s): return s.cumsum().max() #return s.max()

dummy_data = pd.DataFrame( {'AIRLINE': {0: 'WN', 1: 'UA', 2: 'MQ', 3: 'AA', 4: 'WN'}, 'ORG_AIR': {0: 'LAX', 1: 'DEN', 2: 'DFW', 3: 'DFW', 4: 'LAX'}, 'DIST': {0: 590, 1: 1452, 2: 641, 3: 1192, 4: 1363}})

gb = dummy_data.groupby(['AIRLINE', 'ORG_AIR'])

result = gb.agg( #'max' max_test )

print(result)

Problem description

Prior to Pandas 1.0rc this worked. It now raises an exception:

$ python /tmp/regpandas.py
Traceback (most recent call last):
  File "/tmp/regpandas.py", line 16, in <module>
    max_test
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/groupby/generic.py", line 948, in aggregate
    return self._python_agg_general(func, *args, **kwargs)
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 936, in _python_agg_general
    result, counts = self.grouper.agg_series(obj, f)
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 641, in agg_series
    return self._aggregate_series_fast(obj, func)
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/groupby/ops.py", line 666, in _aggregate_series_fast
    result, counts = grouper.get_result()
  File "pandas/_libs/reduction.pyx", line 376, in pandas._libs.reduction.SeriesGrouper.get_result
  File "pandas/_libs/reduction.pyx", line 193, in pandas._libs.reduction._BaseGrouper._apply_to_group
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 913, in <lambda>
    f = lambda x: func(x, *args, **kwargs)
  File "/tmp/regpandas.py", line 4, in max_test
    return s.cumsum().max()
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/generic.py", line 11331, in cum_func
    result = self._data.apply(na_accum_func)
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 440, in apply
    applied = b.apply(f, **kwargs)
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 403, in apply
    result = self.make_block(values=_block_shape(result, ndim=self.ndim))
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 273, in make_block
    return make_block(values, placement=placement, ndim=self.ndim)
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 3041, in make_block
    return klass(values, ndim=ndim, placement=placement)
  File "/Users/matt/.env/pandas1/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 125, in __init__
    f"Wrong number of items passed {len(self.values)}, "
ValueError: Wrong number of items passed 2, placement implies 1

Expected Output

$ python /tmp/regpandas.py
                 DIST
AIRLINE ORG_AIR
AA      DFW      1192
MQ      DFW       641
UA      DEN      1452
WN      LAX      1953

Output of `pd.show_versions()`