BUG: Categorical.take and Series([Categorical]).take is inconsistent with other dtypes · Issue #20664 · pandas-dev/pandas (original) (raw)

Related to #20582 and #20640

Categorical.take does not do any bounds checking, so produces wrong results:

In [9]: cat = pd.Categorical(['a', 'b', 'a'])

In [12]: cat.take([3])
Out[12]: 
[0]                       # <--------- non-sensical result (produces invalid Categorical), should error instead
Categories (2, object): [a, b]

In [10]: cat.take([-1])
Out[10]: 
[NaN]                      # <--------- this is correct
Categories (2, object): [a, b]

In [15]: cat[:0].take([0])
Out[15]: 
[a]                             # <--------- should also error
Categories (2, object): [a, b]

Series.take with categorical data does not follow the numpy behaviour of "-1" (starting from end):

In [22]: s = pd.Series(cat)

In [24]: s.take([-1])
Out[24]: 
2    NaN            # <--------- wrongly NaN, because it dispatches to `Categorical.take` which returns this
dtype: category
Categories (2, object): [a, b]

In [25]: s.astype(object).take([-1])
astype
Out[25]: 
2    a
dtype: object