pandas-dev/pandas (original) (raw)

At the moment filter reorders wrt the groups.

In [1]: data = pd.DataFrame(
    {'pid' : [1,1,1,2,2,3,3,3],
     'tag' : [23,45,62,24,45,34,25,62],
     })

In [2]: g = data.groupby('tag')

In [3]: g.filter(lambda x: len(x) > 1)
Out[3]: 
   pid  tag
1    1   45
4    2   45
2    1   62
7    3   62

If there is a way to efficiently keep the order that would be ideal I think, failing that sort back afterwards (but being wary of sorting with dupe/unordered indexes).