[BUG] Sum of grouped bool has inconsistent dtype by rhshadrach · Pull Request #32894 · pandas-dev/pandas (original) (raw)

Would appreciate any feedback on this attempt.

The strategy is to modify the dtype after the aggregation is computed in certain cases when casting. In order for this to work, the cast functions need to be made aware of how the data was aggregated. I've added an optional "how" argument to maybe_downcast_numeric and _try_cast. Because this dtype change is needed in two places, I've added the function groupby_result_dtype to dtypes/common.py to handle the logic.

I wasn't sure where the mapping information needed by groupby_result_dtype should be stored. Currently it is in the function itself, but maybe there is a better place for it.

If this is a good approach, it could potentially be expanded for other aggregations and datatypes. One thought is that perhaps groupby(-).mean() should always return a float for numeric types.