Why pd.BooleanDtype() is casted to Float64 by groupby/last? · Issue #33071 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'a': ['x', 'x', 'y', 'y'], 'b': ['x', 'x', 'y', 'y'], 'c': [False, False, True, False]})
>>> df['d'] = df.c.astype(pd.BooleanDtype())
>>>
>>> df.dtypes
a object
b object
c bool
d boolean
dtype: object
>>>
>>> df.groupby(['a', 'b']).c.last()
a b
x x False
y y False
Name: c, dtype: bool
>>>
>>> df.groupby(['a', 'b']).d.last()
a b
x x 0.0
y y 0.0
Name: d, dtype: float64
>
Problem description
df.groupby(['a', 'b']).c.last() returns False, but df.groupby(['a', 'b']).d.last() returns Float64.
Why the difference?
Expected Output
I expect that both values should be False
Output of pd.show_versions()
python : 3.7.4.final.0
pandas : 1.0.3