ENH: Support mask in GroupBy.cumsum by phofl · Pull Request #48070 · pandas-dev/pandas (original) (raw)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two things here:

When using int8 as a dtype, we can easily get overflows, because the type is not adjusted, for example a group with [111, 111] would overflow, so casting to int64 beforehand avoids this. This is the bugfix
Secondly, currently ea dtypes like Int64 are cast to float before calling group_cumsum, which is losing precision for high integers that did not fit into float64. Additionally, using the mask improves performance for extension array dtypes.

I don't want to create specific dtypes per function. I want to keep the mask support for every function in separate prs. When more and more get merged, I will be able to combine the types and then we will be able to use these types for more functions.