API: Expand DataFrame.astype to allow Categorical(categories, ordered) · Issue #14676 · pandas-dev/pandas (original) (raw)
A small, complete example of the issue
This is a proposal to allow something like
df.astype({'A': pd.CategoricalDtype(['a', 'b', 'c', 'd'], ordered=True})
Currently, it can be awkward to convert many columns in a DataFrame to a Categorical with control over the categories and orderedness. If you just want to use the defaults, it's not so bad with .astype
:
In [5]: df = pd.DataFrame({"A": list('abc'), 'B': list('def')})
In [6]: df Out[6]: A B 0 a d 1 b e 2 c f
In [8]: df.astype({"A": 'category', 'B': 'category'}).dtypes Out[8]: A category B category dtype: object
If you need to control categories
or ordered
, your best off with
In [20]: mapping = {'A': lambda x: x.A.astype('category').cat.set_categories(['a', 'b'], ordered=True), ...: 'B': lambda x: x.B.astype('category').cat.set_categories(['d', 'f', 'e'], ordered=False)}
In [21]: df.assign(**mapping) Out[21]: A B 0 a d 1 b e 2 NaN f
By expanding astype to accept instances of Categorical
, you remove the need for the lambda
s and you can do conversions of other types at the same time.
This would mirror the semantics in #14503
Updated to change pd.Categorical(...)
to a new/modified pd.CategoricalDtype(...)
based on the discussion below.