Sparse get dummies perf by TomAugspurger · Pull Request #21997 · pandas-dev/pandas (original) (raw)
Previously, we did a scalar elem == -1
for every element in the ndarray.
This replaces that check with a vectorized array == -1
.
Running the ASV now. In the meantime, here's a simple timeit on the same problem
HEAD
In [3]: %timeit pd.get_dummies(s, sparse=True) 561 ms ± 4.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Master
In [3]: %timeit pd.get_dummies(s, sparse=True) 2.18 s ± 273 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)