BUG: tslib.tz_convert and tslib.tz_convert_single may output different result in DST by sinhrks · Pull Request #7798 · pandas-dev/pandas (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation14 Commits1 Checks0 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

sinhrks

These functions may return different result in case of DST. There seems to be 2 problems:

import pandas as pd
import numpy as np
import datetime
import pytz

idx = pd.date_range('2014-03-01', '2015-01-10', freq='H')

f = lambda x: pd.tslib.tz_convert_single(x, pd.tslib.maybe_get_tz('US/Eastern'), 'UTC')
result = pd.tslib.tz_convert(idx.asi8, pd.tslib.maybe_get_tz('US/Eastern'), 'UTC')
result_single = np.vectorize(f)(idx.asi8)
result[result != result_single]
# [1394370000000000000 1394373600000000000 1394377200000000000 ...,
#  1414918800000000000 1414922400000000000 1414926000000000000]

Note

Additionally, it was modifed to close #7880 also.

@sinhrks

Perf result. I think it increases a performance of inferred_freq for tz-aware data a bit.

...
write_store                                  |  11.6420 |   9.7557 |   1.1934 |
reindex_fillna_pad                           |   1.0404 |   0.8606 |   1.2089 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [7a61a96] : BUG: tslib.tzconvert handles DST incorrectly
Base   [30764bd] : Merge pull request #7783 from lexual/docs_categorical_tidyup

@jreback

can you run the perf test here: #7652 (and the 'test' in #7633,
which couldn't figure out how to put in a vbench, its basically a str(df) with a big period index).

just to make sure the perf is still ok

(your fix looks good though)

@sinhrks

OK, done.

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
datetimeindex_normalize                      |   5.6077 |   5.7670 |   0.9724 |

Target [3c8e2b3] : BUG: tslib.tzconvert handles DST incorrectly
Base   [8cd3dd6] : Merge pull request #7652 from jreback/dst_transitions

@sinhrks

And test in #7633 takes few seconds.

import pandas as pd

import numpy as np
period = 1.0 / 2048 * 1e9
freq = pd.datetools.Nano(period)
df = pd.DataFrame({'a': np.random.randn(6e6)}, index=pd.date_range('now', periods=6e6, freq=freq, tz='EST'))
print(df.a)
...
# 2014-07-19 03:08:12.685023438-05:00    0.650118
# 2014-07-19 03:08:12.685511719-05:00   -0.683781
# Freq: 488281N, Name: a, Length: 6000000
# [Finished in 1.9s]

@rockg

Should we add a test for the fall DST transition similar to what you did for spring? I understand there is not really any specific "spring" code per se, but sometimes strange things popup. Otherwise the changes look fine.

@sinhrks

@rockg Yep, added tests. Could you check whether it is valid, otherwise lmk better one?

jreback

def test_tslib_tz_convert_dst(self):
# Start DST
idx = date_range('2014-03-08 23:00', '2014-03-09 09:00', freq='1min', tz='UTC')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a freq loop around this with several freqs? (I know a bit annoying as you have to specify the result as well). you have T, maybe H and D? (as D should be unaffected by the DST transitions, while T and H are)

@rockg

That's exactly what I was thinking.

@sinhrks

OK, modified test to cover other freqs.

@sinhrks

@sinhrks

@jreback I think it is ready, lmk if anything.

@sinhrks

@jreback

looks fine, what was the problem with #7880 ? (e.g. what fixed it)

@sinhrks

jreback added a commit that referenced this pull request

Aug 3, 2014

@jreback

BUG: tslib.tz_convert and tslib.tz_convert_single may output different result in DST

@jreback