BUG/API: Unordered Categorical should ignore order in comparisons? · Issue #16014 · pandas-dev/pandas (original) (raw)

I think that when comparing two unordered categorical-dtyped series with categories which differ only by ordered should compare equal

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd c In [2]: c1 = pd.Series(pd.Categorical(['a', 'b']))

In [3]: c2 = pd.Series(pd.Categorical(['a', 'b'], categories=['b', 'a']))

In [4]: c1 Out[4]: 0 a 1 b dtype: category Categories (2, object): [a, b]

In [5]: c2 Out[5]: 0 a 1 b dtype: category Categories (2, object): [b, a]

In [6]: c1 == c2


TypeError Traceback (most recent call last) in () ----> 1 c1 == c2

/Users/taugspurger/Envs/statsmodels-dev/lib/python3.6/site-packages/pandas/core/ops.py in wrapper(self, other, axis) 811 msg = 'Can only compare identically-labeled Series objects' 812 raise ValueError(msg) --> 813 return self._constructor(na_op(self.values, other.values), 814 index=self.index, name=name) 815 elif isinstance(other, pd.DataFrame): # pragma: no cover

/Users/taugspurger/Envs/statsmodels-dev/lib/python3.6/site-packages/pandas/core/ops.py in na_op(x, y) 752 # in either operand 753 if is_categorical_dtype(x): --> 754 return op(x, y) 755 elif is_categorical_dtype(y) and not isscalar(y): 756 return op(y, x)

/Users/taugspurger/Envs/statsmodels-dev/lib/python3.6/site-packages/pandas/core/categorical.py in f(self, other) 55 if ((len(self.categories) != len(other.categories)) or 56 not ((self.categories == other.categories).all())): ---> 57 raise TypeError("Categoricals can only be compared if " 58 "'categories' are the same") 59 if not (self.ordered == other.ordered):

TypeError: Categoricals can only be compared if 'categories' are the same

Expected Output

I think this should return True. Unordered categories shouldn't care about the order :)

cc @JanSchulz thoughts?