BUG: sort=False ignored when grouping with a categorical column (original) (raw)
Hello everyone,
I stumbled upon the following behavior of groubpy with categorical, which seems at least inconsistent with the way groupby usually operates.
When grouping on a string type column with sort=False, the order of the groups is the order in which the keys first appear in the column.
However, when grouping with a categorical column, the groups seem to be always ordered by the categorical, even when sort=False.
import pandas as pd d = {'foo': [10, 8, 5, 6, 4, 1, 7], 'bar': [10, 20, 30, 40, 50, 60, 70], 'baz': ['d', 'c', 'e', 'a', 'a', 'd', 'c']} df = pd.DataFrame(d) cat = pd.cut(df['foo'], np.linspace(0, 10, 5)) df['range'] = cat groups = df.groupby('range', sort=True)
Expected behaviour
result = groups.agg('mean')
Why are the categorical still sorted in this case ?
groups2 = df.groupby('range', sort=False) result2 = groups2.agg('mean')
I would expect an output like this one: keep the order in which the groups
are first encountered
groups3 = df.groupby('baz', sort=False) result3 = groups3.agg('mean')
| bar | foo | |
|---|---|---|
| range | ||
| (0, 2.5] | 60 | 1.0 |
| (2.5, 5] | 40 | 4.0 |
| (5, 7.5] | 55 | 6.5 |
| (7.5, 10] | 15 | CC |
| bar | foo | |
|---|---|---|
| range | ||
| (0, 2.5] | 60 | 1.0 |
| (2.5, 5] | 40 | 4.0 |
| (5, 7.5] | 55 | 6.5 |
| (7.5, 10] | 15 | CC |
| bar | foo | |
|---|---|---|
| baz | ||
| d | 35 | 5.5 |
| c | 45 | 7.5 |
| e | 30 | 5.0 |
| a | 45 | 9.0 |
pd.version Out[110]: '0.15.1'
Setting as_index=False does not change the presented bahavior.