PERF: Sparse IntIndex.make_union / Numeric ops by sinhrks · Pull Request #13036 · pandas-dev/pandas (original) (raw)
- tests added / passed
- passes
git diff upstream/master | flake8 --diff
- whatsnew entry
Replace repeated list.append
with np.union1d
in IntIndex.make_union
. make_union
is used in numeric ops.
NOTE: It is also possible to fix IntIndex.intersect
to use np.intersect1d
, but it doesn't increase the performance (because the length of the result is smaller).
The below microbench assumes array's 90% is sparse.
import numpy as np
import pandas as pd
np.random.seed(1)
N = 1000000
a = np.array([np.nan] * N)
b = np.array([np.nan] * N)
indexer_a = np.unique(np.random.randint(0, N, N / 10))
indexer_b = np.unique(np.random.randint(0, N, N / 10))
a[indexer_a] = np.random.randint(0, 100, len(indexer_a))
b[indexer_b] = np.random.randint(0, 100, len(indexer_b))
sa = pd.SparseArray(a)
sb = pd.SparseArray(b)
on current master
%timeit sa.sp_index.make_union(sb.sp_index)
#10 loops, best of 3: 52.7 ms per loop
%timeit sa + sb
10 loops, best of 3: 47.8 ms per loop
After this PR
%timeit sa.sp_index.make_union(sb.sp_index)
100 loops, best of 3: 11.6 ms per loop
%timeit sa + sb
100 loops, best of 3: 15.3 ms per loop