PERF: groupby-diff · Issue #16706 · pandas-dev/pandas (original) (raw)

groupby-diff could be written as a single function that operates on the entire frame all at once. This is essentially a python-look. xref to #11296 for groupby-fillna

In [2]: ngroups=100

In [3]: N=10000

In [6]: df = pd.DataFrame({'time': pd.date_range('20170101', periods=100).take(np.random.randint(0, 100, size=N)), 'id': np.random.randint(0, ngroups, size=N)})

By unwrapping we can get a large perf boost

In [7]: %timeit _ = df.groupby('id').time.diff() / pd.Timedelta('1 day')
22.9 ms ± 443 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [9]: %timeit _ = (df.time - df.groupby('id').time.shift()) / pd.Timedelta('1 day')
2.8 ms ± 24.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)