BUG: Categorical.take and Series([Categorical]).take is inconsistent with other dtypes · Issue #20664 · pandas-dev/pandas (original) (raw)
Categorical.take
does not do any bounds checking, so produces wrong results:
In [9]: cat = pd.Categorical(['a', 'b', 'a'])
In [12]: cat.take([3])
Out[12]:
[0] # <--------- non-sensical result (produces invalid Categorical), should error instead
Categories (2, object): [a, b]
In [10]: cat.take([-1])
Out[10]:
[NaN] # <--------- this is correct
Categories (2, object): [a, b]
In [15]: cat[:0].take([0])
Out[15]:
[a] # <--------- should also error
Categories (2, object): [a, b]
Series.take
with categorical data does not follow the numpy behaviour of "-1" (starting from end):
In [22]: s = pd.Series(cat)
In [24]: s.take([-1])
Out[24]:
2 NaN # <--------- wrongly NaN, because it dispatches to `Categorical.take` which returns this
dtype: category
Categories (2, object): [a, b]
In [25]: s.astype(object).take([-1])
astype
Out[25]:
2 a
dtype: object