PERF: Merge empty frame by lukemanley · Pull Request #45838 · pandas-dev/pandas (original) (raw)

Faster merge when left or right is empty.

One ASV added:

       before           after         ratio
     [7651c082]       [4f451520]
     <main>           <merge-empty-frame>
-      12.0±0.3ms      1.42±0.09ms     0.12  join_merge.Merge.time_merge_dataframe_empty(False)
-        24.3±2ms      1.29±0.01ms     0.05  join_merge.Merge.time_merge_dataframe_empty(True)

Some additional examples:

N = 10_000_000

df1 = pd.DataFrame(
    np.random.randint(0, 100, (N, 2)),
    columns=['A', 'B'],
)

df2 = df1.set_index('A')

df_empty = pd.DataFrame(columns=['A', 'C'], dtype='int64')
%timeit df1.merge(df_empty, how="right")  
242 ms ± 5.32 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)        <-- main
1.12 ms ± 50.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)  <-- PR
%timeit df2.merge(df_empty, how="left", left_index=True, right_on='A')    
1.07 s ± 47.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <-- main
241 ms ± 21.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <-- PR