BUG: tslib.tz_convert and tslib.tz_convert_single may output different result in DST by sinhrks · Pull Request #7798 · pandas-dev/pandas (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation14 Commits1 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
These functions may return different result in case of DST. There seems to be 2 problems:
tslib.tz_convert
checks DST and changedeltas
by adjustingpos
by 1. If input has time-gaps more than 2 DST spans, result will be incorrect.tslib.tz_convert_single
results incorrect if input is just on DST edge.
import pandas as pd
import numpy as np
import datetime
import pytz
idx = pd.date_range('2014-03-01', '2015-01-10', freq='H')
f = lambda x: pd.tslib.tz_convert_single(x, pd.tslib.maybe_get_tz('US/Eastern'), 'UTC')
result = pd.tslib.tz_convert(idx.asi8, pd.tslib.maybe_get_tz('US/Eastern'), 'UTC')
result_single = np.vectorize(f)(idx.asi8)
result[result != result_single]
# [1394370000000000000 1394373600000000000 1394377200000000000 ...,
# 1414918800000000000 1414922400000000000 1414926000000000000]
Note
Additionally, it was modifed to close #7880 also.
Perf result. I think it increases a performance of inferred_freq
for tz-aware data a bit.
...
write_store | 11.6420 | 9.7557 | 1.1934 |
reindex_fillna_pad | 1.0404 | 0.8606 | 1.2089 |
-------------------------------------------------------------------------------
Test name | head[ms] | base[ms] | ratio |
-------------------------------------------------------------------------------
Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234
Target [7a61a96] : BUG: tslib.tzconvert handles DST incorrectly
Base [30764bd] : Merge pull request #7783 from lexual/docs_categorical_tidyup
can you run the perf test here: #7652 (and the 'test' in #7633,
which couldn't figure out how to put in a vbench, its basically a str(df)
with a big period index).
just to make sure the perf is still ok
(your fix looks good though)
OK, done.
-------------------------------------------------------------------------------
Test name | head[ms] | base[ms] | ratio |
-------------------------------------------------------------------------------
datetimeindex_normalize | 5.6077 | 5.7670 | 0.9724 |
Target [3c8e2b3] : BUG: tslib.tzconvert handles DST incorrectly
Base [8cd3dd6] : Merge pull request #7652 from jreback/dst_transitions
And test in #7633 takes few seconds.
import pandas as pd
import numpy as np
period = 1.0 / 2048 * 1e9
freq = pd.datetools.Nano(period)
df = pd.DataFrame({'a': np.random.randn(6e6)}, index=pd.date_range('now', periods=6e6, freq=freq, tz='EST'))
print(df.a)
...
# 2014-07-19 03:08:12.685023438-05:00 0.650118
# 2014-07-19 03:08:12.685511719-05:00 -0.683781
# Freq: 488281N, Name: a, Length: 6000000
# [Finished in 1.9s]
Should we add a test for the fall DST transition similar to what you did for spring? I understand there is not really any specific "spring" code per se, but sometimes strange things popup. Otherwise the changes look fine.
@rockg Yep, added tests. Could you check whether it is valid, otherwise lmk better one?
def test_tslib_tz_convert_dst(self): |
# Start DST |
idx = date_range('2014-03-08 23:00', '2014-03-09 09:00', freq='1min', tz='UTC') |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a freq loop around this with several freqs? (I know a bit annoying as you have to specify the result as well). you have T
, maybe H
and D
? (as D should be unaffected by the DST transitions, while T and H are)
That's exactly what I was thinking.
OK, modified test to cover other freqs.
@jreback I think it is ready, lmk if anything.
looks fine, what was the problem with #7880 ? (e.g. what fixed it)
jreback added a commit that referenced this pull request
BUG: tslib.tz_convert and tslib.tz_convert_single may output different result in DST