Refactor groupby helper from tempita to fused types by noamher · Pull Request #24954 · pandas-dev/pandas (original) (raw)
I can't. Because then it will be found and used instead of the typed version.
Which will cause problems for a rather odd case where int64 is the type and the function is supposed to not be found at first, then found after the ndarray is cast into float64.not sure I understand we do this for all other functions.
@jreback I already have the next PR ready for the rest of the functions (group_prod, group_var, group_mean, group_ohlc).
I had to do the same thing (e.g. creating the group_prod_float64) for them as well for the same reason.
Description of the issue:
The issue is that the calling mechanism, first looks up by funcname, then by funcname_{{type}}.
The code the calls the calling mechanism has a fallback so that if an group_add is looked up with type int64, then the fallback is to cast into the input ndarray into float64, then lookup group_add with type float64.
So group_add is eventually called with an ndarray of type float64 and returns a result which is an ndarray of float64. Now, before returning the result, the fallback code casts the result back into int64. Then returns the result.
Implications:
Creating a group_add function that will be looked-up directly (not using type) means that when the lookup type is int64, the method will still be found (without the type in the name anyway). Calling it will succeed because of an implicit cast from int64 to float64.
Because the fallback code was never needed, there will be no cast back into int64.
This will fail tests and change the behavior of whatever's calling it in the first place.
To avoid this implications, group_add_{{type}} are defined the the function group_add is renamed into something that will fail the first lookup (e.g. _group_add).
I know that's not the prettiest solution but changing the calling mechanism is something I really tried to avoid because it seems a bit sensitive code to change without breaking anything.