PERF: significant speedups in tz-aware operations by qwhelan · Pull Request #24491 · pandas-dev/pandas (original) (raw)

Operations involving tz-aware data currently incur a pretty substantial penalty:

[ 93.04%] ··· timeseries.DatetimeAccessor.time_dt_accessor_year                                                                                                           ok
[ 93.04%] ··· ============ =============
                   t
              ------------ -------------
                  None      2.41±0.07ms
             **US/Eastern     150±2ms**
                  UTC       2.64±0.07ms
                tzutc()     2.70±0.06ms
              ============ =============

[ 93.19%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 93.19%] ··· ============ ============
               index_type
              ------------ ------------
                  dst          n/a
                repeated       n/a
              **tz_aware     305±7ms**
                tz_naive    3.38±0.2ms
              ============ ============

This PR improves the performance of tz-aware operations to near that of tz-naive ones through a couple approaches:

Here's the same comparison with the PR:

[ 26.38%] ··· timeseries.DatetimeAccessor.time_dt_accessor_year                                                                                                           ok
[ 26.38%] ··· ============ =============
                   t
              ------------ -------------
                  None      2.31±0.06ms
               US/Eastern   3.70±0.08ms
                  UTC       2.49±0.03ms
                tzutc()      2.43±0.1ms
              ============ =============

[ 26.52%] ··· timeseries.DatetimeIndex.time_add_timedelta                                                                                                                 ok
[ 26.52%] ··· ============ =============
               index_type
              ------------ -------------
                  dst           n/a
                repeated        n/a
                tz_aware     4.01±0.2ms
                tz_naive    2.33±0.04ms
              ============ =============

And asv output:

$ asv compare upstream/master HEAD -s --sort ratio --only-changed
       before           after         ratio
     [02a97c0a]       [5a5ed18b]
     <tz_aware_op_speedup~2>       <tz_aware_op_speedup>
-      3.38±0.2ms      2.33±0.04ms     0.69  timeseries.DatetimeIndex.time_add_timedelta('tz_naive')
-      21.0±0.2ms       14.4±0.3ms     0.68  inference.DateInferOps.time_timedelta_plus_datetime
-        700±80ns         366±20ns     0.52  timestamp.TimestampProperties.time_dayofweek(<UTC>, 'B')
-         181±2ms       30.6±0.7ms     0.17  timeseries.DatetimeAccessor.time_dt_accessor_time('US/Eastern')
-        180±10ms       29.1±0.9ms     0.16  timeseries.DatetimeAccessor.time_dt_accessor_date('US/Eastern')
-         178±3ms       26.7±0.6ms     0.15  timeseries.DatetimeAccessor.time_dt_accessor_day_name('US/Eastern')
-         179±2ms       26.3±0.4ms     0.15  timeseries.DatetimeIndex.time_to_time('tz_aware')
-         172±4ms       24.8±0.7ms     0.14  timeseries.DatetimeAccessor.time_dt_accessor_month_name('US/Eastern')
-         175±5ms       24.1±0.9ms     0.14  timeseries.DatetimeIndex.time_to_date('tz_aware')
-         150±2ms      3.70±0.08ms     0.02  timeseries.DatetimeAccessor.time_dt_accessor_year('US/Eastern')
-        95.3±5ms      2.02±0.07ms     0.02  indexing.NonNumericSeriesIndexing.time_getitem_label_slice('datetime', 'nonunique_monotonic_inc')
-         149±2ms      2.85±0.02ms     0.02  timeseries.DatetimeIndex.time_timeseries_is_month_start('tz_aware')
-         305±7ms       4.01±0.2ms     0.01  timeseries.DatetimeIndex.time_add_timedelta('tz_aware')
-        95.3±3ms          356±9μs     0.00  indexing.NonNumericSeriesIndexing.time_get_value('datetime', 'nonunique_monotonic_inc')
       before           after         ratio
     [02a97c0a]       [5a5ed18b]
     <tz_aware_op_speedup~2>       <tz_aware_op_speedup>
+         323±6ms          386±9ms     1.19  timeseries.ToDatetimeISO8601.time_iso8601_tz_spaceformat
+         165±2ms          191±4ms     1.15  timeseries.ToDatetimeCache.time_dup_string_tzoffset_dates(False)
+         151±5μs          170±2μs     1.13  indexing.NonNumericSeriesIndexing.time_getitem_scalar('datetime', 'nonunique_monotonic_inc')