groupby filtering is missing some groups · Issue #7870 · pandas-dev/pandas (original) (raw)

I brought this up on the mailing list. @cpcloud modified my example into very concise sample showing expected and resulting output:

In [20]: df  = pd.DataFrame([
    ['best', 'a', 'x'],
    ['worst', 'b', 'y'],
    ['best', 'c', 'x'],
    ['best','d', 'y'],
    ['worst','d', 'y'],
    ['worst','d', 'y'],
    ['best','d', 'z'],
], columns=['a', 'b', 'c'])

In [21]: pd.concat(v[v.a == 'best'] for _, v in df.groupby('c'))
Out[21]:
      a  b  c
0  best  a  x
2  best  c  x
3  best  d  y # <--- missing from the next statement
6  best  d  z

In [22]: df.groupby('c').filter(lambda g: g.a == 'best')
Out[22]:
      a  b  c
0  best  a  x
2  best  c  x
6  best  d  z