ENH: Keep dtypes in MultiIndex.union without NAs by phofl · Pull Request #48505 · pandas-dev/pandas (original) (raw)

fast_unique_multiple was never used with more than 2 arrays, so no need to keep the implementation around. Returning indices from the cython level allows us to operate on the initial object and hence keeping the dtypes.

     [fe9e5d02]       [e690752b]
     <midx_union_no_na~4>       <midx_union_no_na>
-      59.6±0.3ms       53.0±0.8ms     0.89  multiindex_object.SetOperations.time_operation('non_monotonic', 'string', 'union')
-      44.2±0.2ms       25.7±0.2ms     0.58  multiindex_object.SetOperations.time_operation('non_monotonic', 'int', 'union')
-        44.5±1ms       25.7±0.5ms     0.58  multiindex_object.SetOperations.time_operation('monotonic', 'int', 'union')
-        115±10ms       49.1±0.5ms     0.43  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'union')
-       114±0.3ms       48.0±0.7ms     0.42  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'union')

As a follow up we could improve the cython implementation to handle duplicates in right too