Indexing datetime index over summer/winter time change gives incorrect result · Issue #21846 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

import pandas as pd print(pd.version, '\n')

0.23.1

The timechange occurs at index 3 2017-10-29 02:30:00+02:00 -> 2017-10-29 02:00:00+01:00

series1 = pd.Series([0,1,2,3], index=pd.date_range('2017-10-29 01:30:00', tz='Europe/Berlin', periods=4, freq='30 min'))

The only difference is one timestamp more

series2 = pd.Series([0,1,2,3,4], index=pd.date_range('2017-10-29 01:30:00', tz='Europe/Berlin', periods=5, freq='30 min'))

t_1=pd.Timestamp('2017-10-29 02:30:00+02:00', tz='Europe/Berlin', freq='30min') t_2=pd.Timestamp('2017-10-29 02:00:00+01:00', tz='Europe/Berlin', freq='30min')

this one works fine

print('series1 + slice') print(series1, '\n')

prints:

2017-10-29 01:30:00+02:00 0

2017-10-29 02:00:00+02:00 1

2017-10-29 02:30:00+02:00 2

2017-10-29 02:00:00+01:00 3

Freq: 30T, dtype: int64

print(series1.loc[t_1:t_2], '\n')

prints:

2017-10-29 02:30:00+02:00 2

2017-10-29 02:00:00+01:00 3

Freq: 30T, dtype: int64

here's the bug

print('series2 + slice') print(series2, '\n')

prints:

2017-10-29 01:30:00+02:00 0

2017-10-29 02:00:00+02:00 1

2017-10-29 02:30:00+02:00 2

2017-10-29 02:00:00+01:00 3

2017-10-29 02:30:00+01:00 4

Freq: 30T, dtype: int64

print(series2.loc[t_1:t_2], '\n')

prints:

Series([], Freq: 30T, dtype: int64)

Should have printed the same as indexing series1 (and does using pandas 0.22).

print('indexing example:') print(series1[t_1])

raises a KeyError: Timestamp('2017-10-29 02:30:00+0100', tz='Europe/Berlin')

In 0.22 the returned value is 2 as expected

The following example shows the problem with indexing even clearer:

print('indexing example 2:') print(series2[t_1])

prints 4 (which is the index 2017-10-29 02:30:00+01:00) although the result should be 2 (which is the result in 0.22)

Problem description

There seems to be a problem when slicing and indexing across summer -> winter timechange.
There is no reason why adding a non-related timestamp to an index should influence the slicing of inner timestamps within a datetime index. In pandas 0.22 both slices produce the same result (as expected) and the bug only occurs when upgrading to 0.23.1.

Expected Output

series1 + slice
2017-10-29 01:30:00+02:00    0
2017-10-29 02:00:00+02:00    1
2017-10-29 02:30:00+02:00    2
2017-10-29 02:00:00+01:00    3
Freq: 30T, dtype: int64 

2017-10-29 02:30:00+02:00    2
2017-10-29 02:00:00+01:00    3
Freq: 30T, dtype: int64 

series 2 and slice
2017-10-29 01:30:00+02:00    0
2017-10-29 02:00:00+02:00    1
2017-10-29 02:30:00+02:00    2
2017-10-29 02:00:00+01:00    3
2017-10-29 02:30:00+01:00    4
Freq: 30T, dtype: int64 

2017-10-29 02:30:00+02:00    2
2017-10-29 02:00:00+01:00    3
Freq: 30T, dtype: int64 

indexing example:
2

indexing example 2:
2

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.1
pytest: 3.4.2
pip: 10.0.1
setuptools: 38.4.0
Cython: 0.28.3
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: 0.10.7
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.0.2
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.2.8
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.9.5
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None