groupby filtering is missing some groups · Issue #7870 · pandas-dev/pandas (original) (raw)
I brought this up on the mailing list. @cpcloud modified my example into very concise sample showing expected and resulting output:
In [20]: df = pd.DataFrame([
['best', 'a', 'x'],
['worst', 'b', 'y'],
['best', 'c', 'x'],
['best','d', 'y'],
['worst','d', 'y'],
['worst','d', 'y'],
['best','d', 'z'],
], columns=['a', 'b', 'c'])
In [21]: pd.concat(v[v.a == 'best'] for _, v in df.groupby('c'))
Out[21]:
a b c
0 best a x
2 best c x
3 best d y # <--- missing from the next statement
6 best d z
In [22]: df.groupby('c').filter(lambda g: g.a == 'best')
Out[22]:
a b c
0 best a x
2 best c x
6 best d z