Categorical: don't sort the categoricals if Categorical(..., ordered=False) by jankatins · Pull Request #9347 · pandas-dev/pandas (original) (raw)

In [33]: pd.Categorical(["a", "c", "b", "a"], categories=['a','c','b'], ordered=None)
Out[33]: 
[a, c, b, a]
Categories (3, object): [a < c < b]

In [34]: pd.Categorical(["a", "c", "b", "a"], categories=['a','c','b'], ordered=True)
Out[34]: 
[a, c, b, a]
Categories (3, object): [a < c < b]

In [35]: pd.Categorical(["a", "c", "b", "a"], categories=['a','c','b'], ordered=False)
Out[35]: 
[a, c, b, a]
Categories (3, object): [a, c, b]
In [36]: pd.Categorical(["a", "c", "b", "a"], ordered=None)
Out[36]: 
[a, c, b, a]
Categories (3, object): [a < b < c]

In [37]: pd.Categorical(["a", "c", "b", "a"], ordered=True)
Out[37]: 
[a, c, b, a]
Categories (3, object): [a < b < c]

In [38]: pd.Categorical(["a", "c", "b", "a"], ordered=False)
Out[38]: 
[a, c, b, a]
Categories (3, object): [a, b, c]

So think the intent is to have the 2nd section (where no ordering is specified explicity) be turned into the first section by default (for all cases). They will still be ordered/unordered as a Categorical type as indicated by the ordered attribute.

This will effectively remove the default ordering from lexiographic to the discovery order (which is how .factorize() works).

a couple of options I see

  1. leave this alone
  2. make the default as I describe above, e.g. to the order of appearance
  3. require categories to be specified when ordered=True (e.g. force the user to say the actual ordering)
  4. leave this alone, but make ordered=None -> ordered=False, e.g. if you don't specify an ordering they your ordering is lexographic but you are not considered ordered (this is the same as 1), but we don't default the order

aside from this, I think we need for #9190

def set_order(self, ordered, inplace=False):
    Parameters
    ----------------
    ordered : boolean
       set the ordered attribute for this Categorical to be the passed ordered
    inplace : boolean, default False
        modify the categorical inplace

and then raise on cat.ordered = value (or could deprecate and suggest set_order)

I think 4) is the most logical here. Basically turning default Categoricals to unordered. The ordering still remains lexographic, unless otherwise overriden.