PERF: cythonize groupby-rank · Issue #15779 · pandas-dev/pandas (original) (raw)

@jreback

This dispatches to each group individually. Better to have a combined group_rank to do this. It is a bit of code and ideally would share some with the actual rank algos.

In [7]: ngroups = 1000

In [8]: N = 100000

In [9]: np.random.seed(1234)

In [10]: df = DataFrame({'key': np.random.randint(0, ngroups, size=N), 'value': np.arange(N)})

In [11]: %timeit df.groupby('key').rank()
1 loop, best of 3: 392 ms per loop

# comparision with group_shift_indexer, a transforming operator
In [13]: %timeit df.groupby('key').shift()
100 loops, best of 3: 3.15 ms per loop