PERF: MultiIndex.values for MI's with DatetimeIndex, TimedeltaIndex, or ExtensionDtype levels by lukemanley · Pull Request #46288 · pandas-dev/pandas (original) (raw)

The change is to do the boxing of these types on the distinct level values and then call .take rather than calling .take and then having to box a potentially much larger array.

The impact is most pronounced when there is a large difference between the number of rows in the index and the number of unique values (e.g. dates) in a given level of the index.

import pandas as pd
import numpy as np

mi = pd.MultiIndex.from_product(
    [ 
        pd.array(np.arange(10000), dtype="Int64"),
        pd.date_range('2000-01-01', periods=1000),
    ]
)

%timeit mi.copy().values

6.63 s ± 212 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)   <- main
775 ms ± 6.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  <- PR
$ asv continuous -f 1.1 upstream/main multiindex-values -b multiindex_object

       before           after         ratio
     <main>           <multiindex-values>
-        32.0±2ms       28.6±0.3ms     0.89  multiindex_object.Duplicated.time_duplicated
-         215±3ms          151±5ms     0.70  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'intersection')
-       347±0.8ms          226±3ms     0.65  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'intersection')
-         323±3ms          208±2ms     0.64  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'union')
-         326±5ms          207±5ms     0.64  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'union')
-         139±1ms         32.4±2ms     0.23  multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'symmetric_difference')
-         140±2ms       32.3±0.8ms     0.23  multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'symmetric_difference')
-        65.6±3ms       8.79±0.2ms     0.13  multiindex_object.Values.time_datetime_level_values_copy