pandas-dev/pandas (original) (raw)
At the moment filter reorders wrt the groups.
In [1]: data = pd.DataFrame(
{'pid' : [1,1,1,2,2,3,3,3],
'tag' : [23,45,62,24,45,34,25,62],
})
In [2]: g = data.groupby('tag')
In [3]: g.filter(lambda x: len(x) > 1)
Out[3]:
pid tag
1 1 45
4 2 45
2 1 62
7 3 62
If there is a way to efficiently keep the order that would be ideal I think, failing that sort back afterwards (but being wary of sorting with dupe/unordered indexes).