Indexing datetime index over summer/winter time change gives incorrect result · Issue #21846 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
import pandas as pd print(pd.version, '\n')
0.23.1
The timechange occurs at index 3 2017-10-29 02:30:00+02:00 -> 2017-10-29 02:00:00+01:00
series1 = pd.Series([0,1,2,3], index=pd.date_range('2017-10-29 01:30:00', tz='Europe/Berlin', periods=4, freq='30 min'))
The only difference is one timestamp more
series2 = pd.Series([0,1,2,3,4], index=pd.date_range('2017-10-29 01:30:00', tz='Europe/Berlin', periods=5, freq='30 min'))
t_1=pd.Timestamp('2017-10-29 02:30:00+02:00', tz='Europe/Berlin', freq='30min') t_2=pd.Timestamp('2017-10-29 02:00:00+01:00', tz='Europe/Berlin', freq='30min')
this one works fine
print('series1 + slice') print(series1, '\n')
prints:
2017-10-29 01:30:00+02:00 0
2017-10-29 02:00:00+02:00 1
2017-10-29 02:30:00+02:00 2
2017-10-29 02:00:00+01:00 3
Freq: 30T, dtype: int64
print(series1.loc[t_1:t_2], '\n')
prints:
2017-10-29 02:30:00+02:00 2
2017-10-29 02:00:00+01:00 3
Freq: 30T, dtype: int64
here's the bug
print('series2 + slice') print(series2, '\n')
prints:
2017-10-29 01:30:00+02:00 0
2017-10-29 02:00:00+02:00 1
2017-10-29 02:30:00+02:00 2
2017-10-29 02:00:00+01:00 3
2017-10-29 02:30:00+01:00 4
Freq: 30T, dtype: int64
print(series2.loc[t_1:t_2], '\n')
prints:
Series([], Freq: 30T, dtype: int64)
Should have printed the same as indexing series1 (and does using pandas 0.22).
print('indexing example:') print(series1[t_1])
raises a KeyError: Timestamp('2017-10-29 02:30:00+0100', tz='Europe/Berlin')
In 0.22 the returned value is 2
as expected
The following example shows the problem with indexing even clearer:
print('indexing example 2:') print(series2[t_1])
prints 4
(which is the index 2017-10-29 02:30:00+01:00) although the result should be 2
(which is the result in 0.22)
Problem description
There seems to be a problem when slicing and indexing across summer -> winter timechange.
There is no reason why adding a non-related timestamp to an index should influence the slicing of inner timestamps within a datetime index. In pandas 0.22 both slices produce the same result (as expected) and the bug only occurs when upgrading to 0.23.1.
Expected Output
series1 + slice
2017-10-29 01:30:00+02:00 0
2017-10-29 02:00:00+02:00 1
2017-10-29 02:30:00+02:00 2
2017-10-29 02:00:00+01:00 3
Freq: 30T, dtype: int64
2017-10-29 02:30:00+02:00 2
2017-10-29 02:00:00+01:00 3
Freq: 30T, dtype: int64
series 2 and slice
2017-10-29 01:30:00+02:00 0
2017-10-29 02:00:00+02:00 1
2017-10-29 02:30:00+02:00 2
2017-10-29 02:00:00+01:00 3
2017-10-29 02:30:00+01:00 4
Freq: 30T, dtype: int64
2017-10-29 02:30:00+02:00 2
2017-10-29 02:00:00+01:00 3
Freq: 30T, dtype: int64
indexing example:
2
indexing example 2:
2
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.6.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.1
pytest: 3.4.2
pip: 10.0.1
setuptools: 38.4.0
Cython: 0.28.3
numpy: 1.14.5
scipy: 1.1.0
pyarrow: None
xarray: 0.10.7
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.4
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 1.0.2
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.2.8
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.9.5
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None