groupby().apply() returns different result depends on the first result is None or not. · Issue #12824 · pandas-dev/pandas (original) (raw)

The apply document says that it can:

apply can act as a reducer, transformer, or filter function, depending on exactly what is passed to apply. So depending on the path taken, and exactly what you are grouping. Thus the grouped columns(s) may be included in the output as well as set the indices.

So, I think it may discard the group if the callback function returns None. Here is two exmaple that works and not works:

import pandas as pd
import numpy as np

df = pd.DataFrame({"A":np.arange(10), "B":[1, 1, 1, 2, 3, 3, 3, 4, 4, 4]})
print(df.groupby("B").apply(lambda df2:None if df2.shape[0] <= 2 else df2.iloc[[0, -1]]))

df = pd.DataFrame({"A":np.arange(10), "B":[1, 2, 2, 2, 3, 3, 3, 4, 4, 4]})
print(df.groupby("B").apply(lambda x:None if x.shape[0] <= 2 else x.iloc[[0, -1]]))

the output is as following, the first one returns a DataFrame, the second one returns a Series with DataFrames inside:

     A  B
B        
1 0  0  1
  2  2  1
3 4  4  3
  6  6  3
4 7  7  4
  9  9  4


B
1         A    B
1  NaN  NaN
3  NaN  NaN
2                   A  B
1  1  2
3  3  2
3                   A  B
4  4  3
6  6  3
4                   A  B
7  7  4
9  9  4
dtype: object

The problem is that _wrap_applied_output() in groupby.py checks the first element to determine the concat method:

    if isinstance(values[0], DataFrame):
        return self._concat_objects(keys, values,
                                    not_indexed_same=not_indexed_same)

I want to know, does groupby().apply() support discarding groups?