API: Expand DataFrame.astype to allow Categorical(categories, ordered) · Issue #14676 · pandas-dev/pandas (original) (raw)

A small, complete example of the issue

This is a proposal to allow something like

df.astype({'A': pd.CategoricalDtype(['a', 'b', 'c', 'd'], ordered=True})

Currently, it can be awkward to convert many columns in a DataFrame to a Categorical with control over the categories and orderedness. If you just want to use the defaults, it's not so bad with .astype:

In [5]: df = pd.DataFrame({"A": list('abc'), 'B': list('def')})

In [6]: df Out[6]: A B 0 a d 1 b e 2 c f

In [8]: df.astype({"A": 'category', 'B': 'category'}).dtypes Out[8]: A category B category dtype: object

If you need to control categories or ordered, your best off with

In [20]: mapping = {'A': lambda x: x.A.astype('category').cat.set_categories(['a', 'b'], ordered=True), ...: 'B': lambda x: x.B.astype('category').cat.set_categories(['d', 'f', 'e'], ordered=False)}

In [21]: df.assign(**mapping) Out[21]: A B 0 a d 1 b e 2 NaN f

By expanding astype to accept instances of Categorical, you remove the need for the lambdas and you can do conversions of other types at the same time.

This would mirror the semantics in #14503

Updated to change pd.Categorical(...) to a new/modified pd.CategoricalDtype(...) based on the discussion below.