PERF: GroupBy.quantile · Issue #51385 · pandas-dev/pandas (original) (raw)
Navigation Menu
- Explore
- Pricing
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Appearance settings
Description
import numpy as np import pandas as pd
nrows = 10**7 ncols=10 ngroups = 6
qs = [0.5, 0.75] arr = np.random.randn(nrows, ncols) df = pd.DataFrame(arr) df["A"] = np.random.randint(ngroups, size=nrows)
gb = df.groupby("A")
%timeit v1 = gb.quantile(qs) 39.6 s ± 1.74 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit v2 = {key: gb.get_group(key).quantile(qs) for key in gb.groups} 3.37 s ± 316 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
alt = pd.concat(v2).drop("A", axis=1) alt.index.names = ["A", None] assert alt.equals(v1)
Is GroupBy.quantile doing dramatically too much work?