REF/API: DatetimeTZDtype by TomAugspurger · Pull Request #23990 · pandas-dev/pandas (original) (raw)
On microbenchmarks, things are fine
master:
In [2]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC") CPU times: user 39 µs, sys: 31 µs, total: 70 µs Wall time: 75.6 µs
In [3]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC") CPU times: user 15 µs, sys: 0 ns, total: 15 µs Wall time: 18.8 µs
PR:
In [2]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC") CPU times: user 29 µs, sys: 23 µs, total: 52 µs Wall time: 56 µs
In [3]: %time a = pd.core.dtypes.dtypes.DatetimeTZDtype(unit='ns', tz="UTC") CPU times: user 16 µs, sys: 0 ns, total: 16 µs Wall time: 19.8 µs
ASV for the timeseries, timestamps, and offsets
before after ratio
[6b3490f4] [982c169a]
- 143±9ms 127±4ms 0.89
timeseries.DatetimeIndex.time_add_timedelta('tz_aware')
- 2.73±0.05μs 2.45±0.02μs 0.90 timestamp.TimestampOps.time_tz_convert(tzutc())
Line-profiling reveals that basically all the time is spent on the timezone check.
In [5]: %lprun -s -f DatetimeTZDtype.__init__ DatetimeTZDtype(tz='utc')
Timer unit: 1e-06 s
Total time: 4e-05 s
File: /Users/taugspurger/sandbox/pandas/pandas/core/dtypes/dtypes.py
Function: __init__ at line 494
Line # Hits Time Per Hit % Time Line Contents
==============================================================
494 def __init__(self, unit="ns", tz=None):
495 """
496 An ExtensionDtype for timezone-aware datetime data.
497
498 Parameters
499 ----------
500 unit : str, default "ns"
501 The precision of the datetime data. Currently limited
502 to ``"ns"``.
503 tz : str, int, or datetime.tzinfo
504 The timezone.
505
506 Raises
507 ------
508 pytz.UnknownTimeZoneError
509 When the requested timezone cannot be found.
510
511 Examples
512 --------
513 >>> pd.core.dtypes.dtypes.DatetimeTZDtype(tz='UTC')
514 datetime64[ns, UTC]
515
516 >>> pd.core.dtypes.dtypes.DatetimeTZDtype(tz='dateutil/US/Central')
517 datetime64[ns, tzfile('/usr/share/zoneinfo/US/Central')]
518 """
519 1 7.0 7.0 17.5 if isinstance(unit, DatetimeTZDtype):
520 unit, tz = unit.unit, unit.tz
521
522 1 1.0 1.0 2.5 if unit != 'ns':
523 raise ValueError("DatetimeTZDtype only supports ns units")
524
525 1 0.0 0.0 0.0 if tz:
526 1 27.0 27.0 67.5 tz = timezones.maybe_get_tz(tz)
527 elif tz is not None:
528 raise pytz.UnknownTimeZoneError(tz)
529 elif tz is None:
530 raise TypeError("A 'tz' is required.")
531
532 1 4.0 4.0 10.0 self._unit = unit
533 1 1.0 1.0 2.5 self._tz = tz
I switched the properties to cache_readonly
(which we can do, since we aren't using _cache
for caching instances now).