PERF: pd.concat with EA-backed indexes by lukemanley · Pull Request #49128 · pandas-dev/pandas (original) (raw)
Perf improvement for pd.concat
when objects contain EA-backed indexes. The bottleneck was EA.tolist
. Still relatively slow vs non-EA, but an improvement.
$ asv continuous -f 1.1 upstream/main ea-tolist -b join_merge.ConcatIndexDtype
before after ratio
[90b4add7] [77a21f8d]
<main> <ea-tolist>
- 19.4±0.2ms 12.0±0.07ms 0.62 join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, True, False)
- 14.6±0.2ms 6.78±0.2ms 0.46 join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, True, True)
- 14.2±0.2ms 6.57±0.1ms 0.46 join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, False, False)
- 14.1±0.2ms 6.47±0.09ms 0.46 join_merge.ConcatIndexDtype.time_concat_series('Int64', 1, False, True)
- 23.9±0.1ms 10.3±0.1ms 0.43 join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, True, False)
- 19.7±0.3ms 5.53±0.05ms 0.28 join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, True, True)
- 19.0±0.2ms 5.16±0.03ms 0.27 join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, False, True)
- 18.5±0.2ms 4.82±0.03ms 0.26 join_merge.ConcatIndexDtype.time_concat_series('string[python]', 1, False, False)
- 108±1ms 24.5±0.4ms 0.23 join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, True, False)
- 98.4±0.6ms 17.0±0.3ms 0.17 join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, True, True)
- 97.6±0.8ms 15.9±0.3ms 0.16 join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, False, True)
- 99.2±0.5ms 16.0±0.3ms 0.16 join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 1, False, False)
$ asv continuous -f 1.1 upstream/main ea-tolist -b array.ArrowStringArray
before after ratio
[90b4add7] [77a21f8d]
<main> <ea-tolist>
- 16.5±0.2ms 399±6μs 0.02 array.ArrowStringArray.time_tolist(True)
- 17.1±0.2ms 404±6μs 0.02 array.ArrowStringArray.time_tolist(False)