PERF: change impl for Categorical to use smaller dtype arrays · Issue #8453 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

Appearance settings

@jreback

Description

@jreback

So it seems by using a full Int64 array for the codes, plus the categorires we are actually using MORE memory to store a Categorical. Because the pointers are the same sized as an object array (plus have the categories).

So need to change the codes store to use a smaller dtype of int. Maybe switch this to a plain ndarray, and use dtype=uint8. Would provide a lot of benefit