Performance of pandas.algos.groupby_int64 · Issue #14293 · pandas-dev/pandas (original) (raw)

For dask.dataframe shuffle operations (groupby.apply, merge), when running with multiple threads per process, I sometimes find my computations dominated by pandas.algos.groupby_int64. Looking at the source code for this it looks like it's using dynamic pure python objects from Cython. I'm curious if there are ways to accelerate this function, particularly in multi-threaded situations (releasing the GIL).

One solution that comes to mind would be to do a single pass over labels, pre-compute the length of each members list in results and then pre-allocate these as arrays. This might allow better GIL-releasing behavior.

Thoughts?