BUG: stack_multi_columns extension array downcast · Issue #45740 · pandas-dev/pandas (original) (raw)
Pandas version checks
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
T = pd.core.arrays.integer.Int8Dtype()
df = pd.DataFrame({ ('a', 'x'): pd.Series(range(5), dtype=T), ('b', 'x'): pd.Series(range(5)), })
dtypes = df.stack().dtypes
assert dtypes['a'] == T, dtypes['a']
Issue Description
Extension dtypes (including categoricals) are down-cast to object if they are part of a multi-column data frame with more than one label on the non-stacked level(s).
The function _stack_multi_columns contains a branch for a homogenous extension dtype frame. (https://github.com/pandas-dev/pandas/blob/4f0e22c1c6d9cebed26f45a9a5b873ebbe67cd17/pandas/core/reshape/reshape.py#L744,L745)
The task of this branch is to ensure that extension dtypes are preserved by the stacking.
There’s a bug that causes this branch to not be executed.
Expected Behavior
Expectation: Extension Dtypes should be preserved under stacking.
Installed Versions
Pandas 1.4