REF/API: DatetimeTZDtype by TomAugspurger · Pull Request #23990 · pandas-dev/pandas (original) (raw)

On microbenchmarks, things are fine

master:

In [2]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC") CPU times: user 39 µs, sys: 31 µs, total: 70 µs Wall time: 75.6 µs

In [3]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC") CPU times: user 15 µs, sys: 0 ns, total: 15 µs Wall time: 18.8 µs

PR:

In [2]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC") CPU times: user 29 µs, sys: 23 µs, total: 52 µs Wall time: 56 µs

In [3]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC") CPU times: user 16 µs, sys: 0 ns, total: 16 µs Wall time: 19.8 µs


ASV for the timeseries, timestamps, and offsets

       before           after         ratio
     [6b3490f4]       [982c169a]
-         143±9ms          127±4ms     0.89  
timeseries.DatetimeIndex.time_add_timedelta('tz_aware')
-     2.73±0.05μs      2.45±0.02μs     0.90  timestamp.TimestampOps.time_tz_convert(tzutc())

Line-profiling reveals that basically all the time is spent on the timezone check.


In [5]: %lprun -s -f DatetimeTZDtype.__init__ DatetimeTZDtype(tz='utc')
Timer unit: 1e-06 s

Total time: 4e-05 s
File: /Users/taugspurger/sandbox/pandas/pandas/core/dtypes/dtypes.py
Function: __init__ at line 494

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   494                                               def __init__(self, unit="ns", tz=None):
   495                                                   """
   496                                                   An ExtensionDtype for timezone-aware datetime data.
   497
   498                                                   Parameters
   499                                                   ----------
   500                                                   unit : str, default "ns"
   501                                                       The precision of the datetime data. Currently limited
   502                                                       to ``"ns"``.
   503                                                   tz : str, int, or datetime.tzinfo
   504                                                       The timezone.
   505
   506                                                   Raises
   507                                                   ------
   508                                                   pytz.UnknownTimeZoneError
   509                                                       When the requested timezone cannot be found.
   510
   511                                                   Examples
   512                                                   --------
   513                                                   >>> pd.core.dtypes.dtypes.DatetimeTZDtype(tz='UTC')
   514                                                   datetime64[ns, UTC]
   515
   516                                                   >>> pd.core.dtypes.dtypes.DatetimeTZDtype(tz='dateutil/US/Central')
   517                                                   datetime64[ns, tzfile('/usr/share/zoneinfo/US/Central')]
   518                                                   """
   519         1          7.0      7.0     17.5          if isinstance(unit, DatetimeTZDtype):
   520                                                       unit, tz = unit.unit, unit.tz
   521
   522         1          1.0      1.0      2.5          if unit != 'ns':
   523                                                       raise ValueError("DatetimeTZDtype only supports ns units")
   524
   525         1          0.0      0.0      0.0          if tz:
   526         1         27.0     27.0     67.5              tz = timezones.maybe_get_tz(tz)
   527                                                   elif tz is not None:
   528                                                       raise pytz.UnknownTimeZoneError(tz)
   529                                                   elif tz is None:
   530                                                       raise TypeError("A 'tz' is required.")
   531
   532         1          4.0      4.0     10.0          self._unit = unit
   533         1          1.0      1.0      2.5          self._tz = tz

I switched the properties to cache_readonly (which we can do, since we aren't using _cache for caching instances now).