Sparse get dummies perf by TomAugspurger · Pull Request #21997 · pandas-dev/pandas (original) (raw)

Previously, we did a scalar elem == -1 for every element in the ndarray.

This replaces that check with a vectorized array == -1.

Running the ASV now. In the meantime, here's a simple timeit on the same problem

HEAD

In [3]: %timeit pd.get_dummies(s, sparse=True) 561 ms ± 4.96 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Master

In [3]: %timeit pd.get_dummies(s, sparse=True) 2.18 s ± 273 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)