PERF/BUG: improve factorize for datetimetz by sinhrks · Pull Request #13750 · pandas-dev/pandas (original) (raw)
- tests added / passed
- passes
git diff upstream/master | flake8 --diff
- whatsnew entry
because factorize
internally localize datetimetz, it raises when data contains DST boundary.
dti = pd.date_range('2016-11-06', freq='H', periods=5, tz='US/Eastern')
dti.factorize()
# AmbiguousTimeError: Cannot infer dst time from Timestamp('2016-11-06 01:00:00'), try using the 'ambiguous' argument
Skipped this localization to fix, also it improves perf.
dti = pd.date_range('2011-01-01', freq='H', periods=1000000, tz='Asia/Tokyo')
%timeit dti.factorize()
# on current master
#1 loop, best of 3: 475 ms per loop
# after this PR
#1 loop, best of 3: 262 ms per loop
asv:
before after ratio
[bb6b5e54] [a2c3370a]
- 22.46ms 9.97ms 0.44 timeseries.datetime_algorithm.time_dti_tz_factorize
SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.