PERF: groupby-diff · Issue #16706 · pandas-dev/pandas (original) (raw)
groupby-diff could be written as a single function that operates on the entire frame all at once. This is essentially a python-look. xref to #11296 for groupby-fillna
In [2]: ngroups=100
In [3]: N=10000
In [6]: df = pd.DataFrame({'time': pd.date_range('20170101', periods=100).take(np.random.randint(0, 100, size=N)), 'id': np.random.randint(0, ngroups, size=N)})
By unwrapping we can get a large perf boost
In [7]: %timeit _ = df.groupby('id').time.diff() / pd.Timedelta('1 day')
22.9 ms ± 443 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [9]: %timeit _ = (df.time - df.groupby('id').time.shift()) / pd.Timedelta('1 day')
2.8 ms ± 24.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)