Why pd.BooleanDtype() is casted to Float64 by groupby/last? · Issue #33071 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

>>> import pandas as pd
>>>
>>> df = pd.DataFrame({'a': ['x', 'x', 'y', 'y'], 'b': ['x', 'x', 'y', 'y'], 'c': [False, False, True, False]})
>>> df['d'] = df.c.astype(pd.BooleanDtype())
>>>
>>> df.dtypes
a     object
b     object
c       bool
d    boolean
dtype: object
>>>
>>> df.groupby(['a', 'b']).c.last()
a  b
x  x    False
y  y    False
Name: c, dtype: bool
>>>
>>> df.groupby(['a', 'b']).d.last()
a  b
x  x    0.0
y  y    0.0
Name: d, dtype: float64
>

Problem description

df.groupby(['a', 'b']).c.last() returns False, but df.groupby(['a', 'b']).d.last() returns Float64.
Why the difference?

Expected Output

I expect that both values should be False

Output of pd.show_versions()

python : 3.7.4.final.0
pandas : 1.0.3