PERF: MultiIndex.values for MI's with DatetimeIndex, TimedeltaIndex, or ExtensionDtype levels by lukemanley · Pull Request #46288 · pandas-dev/pandas (original) (raw)
The change is to do the boxing of these types on the distinct level values and then call .take
rather than calling .take
and then having to box a potentially much larger array.
The impact is most pronounced when there is a large difference between the number of rows in the index and the number of unique values (e.g. dates) in a given level of the index.
import pandas as pd
import numpy as np
mi = pd.MultiIndex.from_product(
[
pd.array(np.arange(10000), dtype="Int64"),
pd.date_range('2000-01-01', periods=1000),
]
)
%timeit mi.copy().values
6.63 s ± 212 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <- main
775 ms ± 6.99 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) <- PR
$ asv continuous -f 1.1 upstream/main multiindex-values -b multiindex_object
before after ratio
<main> <multiindex-values>
- 32.0±2ms 28.6±0.3ms 0.89 multiindex_object.Duplicated.time_duplicated
- 215±3ms 151±5ms 0.70 multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'intersection')
- 347±0.8ms 226±3ms 0.65 multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'intersection')
- 323±3ms 208±2ms 0.64 multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'union')
- 326±5ms 207±5ms 0.64 multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'union')
- 139±1ms 32.4±2ms 0.23 multiindex_object.SetOperations.time_operation('monotonic', 'datetime', 'symmetric_difference')
- 140±2ms 32.3±0.8ms 0.23 multiindex_object.SetOperations.time_operation('non_monotonic', 'datetime', 'symmetric_difference')
- 65.6±3ms 8.79±0.2ms 0.13 multiindex_object.Values.time_datetime_level_values_copy