BUG/PERF: MultiIndex setops with sort=None by lukemanley · Pull Request #49010 · pandas-dev/pandas (original) (raw)

Simplify and improve perf in algos.safe_sort which improves perf in MultiIndex setops when sort=None.

           before           after         ratio
     [55dc3243]       [f00149ce]
     <main>           <safe-sort-multiindex>
-        73.3±2ms       20.2±0.4ms     0.28  multiindex_object.SetOperations.time_operation('monotonic', 'ea_int', 'intersection', None)
-        71.0±2ms       19.0±0.6ms     0.27  multiindex_object.SetOperations.time_operation('monotonic', 'int', 'intersection', None)
-       104±0.7ms         24.0±3ms     0.23  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'intersection', None)
-        98.7±2ms       21.4±0.3ms     0.22  multiindex_object.SetOperations.time_operation('monotonic', 'string', 'intersection', None)
-       114±0.5ms       24.7±0.4ms     0.22  multiindex_object.SetOperations.time_operation('non_monotonic', 'ea_int', 'intersection', None)
-       109±0.7ms         21.3±2ms     0.20  multiindex_object.SetOperations.time_operation('non_monotonic', 'int', 'intersection', None)
-        99.3±1ms         18.5±2ms     0.19  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'intersection', None)
-         157±6ms         21.4±2ms     0.14  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'intersection', None)

I updated the expected value in test_union_nan_got_duplicated which was added in #38977.

MultiIndex.union sorts by default but the expected value in that test was not sorted.