PERF: pd.concat with EA-backed indexes by lukemanley · Pull Request #49128 · pandas-dev/pandas (original) (raw)

Perf improvement for pd.concat when objects contain EA-backed indexes. The bottleneck was EA.tolist. Still relatively slow vs non-EA, but an improvement.

$ asv continuous -f 1.1 upstream/main ea-tolist -b join_merge.ConcatIndexDtype

       before           after         ratio
     [90b4add7]       [77a21f8d]
     <main>           <ea-tolist>
-      19.4±0.2ms      12.0±0.07ms     0.62  join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, True, False)
-      14.6±0.2ms       6.78±0.2ms     0.46  join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, True, True)
-      14.2±0.2ms       6.57±0.1ms     0.46  join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, False, False)
-      14.1±0.2ms      6.47±0.09ms     0.46  join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, False, True)
-      23.9±0.1ms       10.3±0.1ms     0.43  join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, True, False)
-      19.7±0.3ms      5.53±0.05ms     0.28  join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, True, True)
-      19.0±0.2ms      5.16±0.03ms     0.27  join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, False, True)
-      18.5±0.2ms      4.82±0.03ms     0.26  join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, False, False)
-         108±1ms       24.5±0.4ms     0.23  join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, True, False)
-      98.4±0.6ms       17.0±0.3ms     0.17  join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, True, True)
-      97.6±0.8ms       15.9±0.3ms     0.16  join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, False, True)
-      99.2±0.5ms       16.0±0.3ms     0.16  join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, False, False)
$ asv continuous -f 1.1 upstream/main ea-tolist -b array.ArrowStringArray

       before           after         ratio
     [90b4add7]       [77a21f8d]
     <main>           <ea-tolist>
-      16.5±0.2ms          399±6μs     0.02  array.ArrowStringArray.time_tolist(True)
-      17.1±0.2ms          404±6μs     0.02  array.ArrowStringArray.time_tolist(False)