PERF: GroupBy.quantile · Issue #51385 · pandas-dev/pandas (original) (raw)

Skip to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sign up

Appearance settings

@jbrockmendel

Description

@jbrockmendel

import numpy as np import pandas as pd

nrows = 10**7 ncols=10 ngroups = 6

qs = [0.5, 0.75] arr = np.random.randn(nrows, ncols) df = pd.DataFrame(arr) df["A"] = np.random.randint(ngroups, size=nrows)

gb = df.groupby("A")

%timeit v1 = gb.quantile(qs) 39.6 s ± 1.74 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit v2 = {key: gb.get_group(key).quantile(qs) for key in gb.groups} 3.37 s ± 316 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

alt = pd.concat(v2).drop("A", axis=1) alt.index.names = ["A", None] assert alt.equals(v1)

Is GroupBy.quantile doing dramatically too much work?