ENH: Support mask in GroupBy.cumsum by phofl · Pull Request #48070 · pandas-dev/pandas (original) (raw)
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two things here:
- When using int8 as a dtype, we can easily get overflows, because the type is not adjusted, for example a group with
[111, 111]
would overflow, so casting toint64
beforehand avoids this. This is the bugfix - Secondly, currently ea dtypes like
Int64
are cast to float before callinggroup_cumsum
, which is losing precision for high integers that did not fit intofloat64
. Additionally, using the mask improves performance for extension array dtypes.
I don't want to create specific dtypes per function. I want to keep the mask support for every function in separate prs. When more and more get merged, I will be able to combine the types and then we will be able to use these types for more functions.