BUG: remove_unused_categories dtype coerces to int64 by sinhrks · Pull Request #13261 · pandas-dev/pandas (original) (raw)

c = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
c.codes
# array([0, 1], dtype=int8)

# NG, must be int8 dtype
c = c.remove_unused_categories()
c.codes
# array([0, 1])

It is because np.unique uses platform int for unique_inverse

np.unique(np.array([0, 3, 2, 3], dtype=np.int8), return_inverse=True)
(array([0, 2, 3], dtype=int8), array([0, 2, 1, 2]))