PERF: Merge empty frame by lukemanley · Pull Request #45838 · pandas-dev/pandas (original) (raw)
- Tests added and passed if fixing a bug or adding a new feature
- All code checks passed.
- Added an entry in the latest
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.
Faster merge when left or right is empty.
One ASV added:
before after ratio
[7651c082] [4f451520]
<main> <merge-empty-frame>
- 12.0±0.3ms 1.42±0.09ms 0.12 join_merge.Merge.time_merge_dataframe_empty(False)
- 24.3±2ms 1.29±0.01ms 0.05 join_merge.Merge.time_merge_dataframe_empty(True)
Some additional examples:
N = 10_000_000
df1 = pd.DataFrame(
np.random.randint(0, 100, (N, 2)),
columns=['A', 'B'],
)
df2 = df1.set_index('A')
df_empty = pd.DataFrame(columns=['A', 'C'], dtype='int64')
%timeit df1.merge(df_empty, how="right")
242 ms ± 5.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <-- main
1.12 ms ± 50.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) <-- PR
%timeit df2.merge(df_empty, how="left", left_index=True, right_on='A')
1.07 s ± 47.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <-- main
241 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <-- PR