BUG: remove_unused_categories dtype coerces to int64 by sinhrks · Pull Request #13261 · pandas-dev/pandas (original) (raw)
- tests added / passed
- passes
git diff upstream/master | flake8 --diff
- whatsnew entry
c = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
c.codes
# array([0, 1], dtype=int8)
# NG, must be int8 dtype
c = c.remove_unused_categories()
c.codes
# array([0, 1])
It is because np.unique
uses platform int for unique_inverse
np.unique(np.array([0, 3, 2, 3], dtype=np.int8), return_inverse=True)
(array([0, 2, 3], dtype=int8), array([0, 2, 1, 2]))