DataFrame.asof() : Timezone Awareness / Naivety comparison TypeError (incorrect) · Issue #21194 · pandas-dev/pandas (original) (raw)
import pandas as pd
# Create some random Timestamps
timestamp1 = pd.Timestamp('2018-01-01 21:00:05.001+00:00')
timestamp2 = pd.Timestamp('2018-01-01 22:35:10.550+00:00')
# Get an internal timestamp, so asof should give us the lesser value
timestamp_internal = timestamp2 + ((timestamp2 - timestamp1) / 2)
# Create a DataFrame
df = pd.DataFrame(data=[1,2], index=[timestamp1, timestamp2])
# display it
df
0
2018-01-01 21:00:05.001000+00:00 1
2018-01-01 22:35:10.550000+00:00 2
# Now call asof() and show the issue
df.asof(timestamp_internal)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py", line 6144, in asof
locs = self.index.asof_locs(where, ~(nulls.values))
File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\base.py", line 2489, in asof_locs
result[(locs == 0) & (where < self.values[first])] = -1
File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\datetimes.py", line 136, in wrapper
self._assert_tzawareness_compat(other)
File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\datetimes.py", line 672, in _assert_tzawareness_compat
raise TypeError('Cannot compare tz-naive and tz-aware '
TypeError: Cannot compare tz-naive and tz-aware datetime-like objects
# Demonstrate other errant behavior - Try using a tz-naive Timestamp to lookup the frame
df.asof(timestamp_innie.tz_localize(None))
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py", line 6108, in asof
if where < start:
File "pandas\_libs\tslibs\timestamps.pyx", line 164, in pandas._libs.tslibs.timestamps._Timestamp.__richcmp__
File "pandas\_libs\tslibs\timestamps.pyx", line 224, in pandas._libs.tslibs.timestamps._Timestamp._assert_tzawareness_compat
TypeError: Cannot compare tz-naive and tz-aware timestamps
Problem description
Somehow the DataFrame index is losing the 'awareness' of the timezone of the original Timestamps.
This has only been noticed recently, following a recent upgrade to the newest version of pandas, however I cannot say for sure whether or not it has been a very recent change to pandas which has caused this.
I am confident (though not 100% certain) that the expected usage worked previously (nor am in a position to test right now).
EDIT: Have tested this with pandas=0.22.0 and the expected output is now working fine.
Hopefully the bug is reproducible.
Expected Output
0 1
Name: 2018-01-01 23:22:43.324500+0000, dtype: int64
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
pd.show_versions()
Matplotlib support failed
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 62 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.23.0
pytest: None
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.13.3
scipy: 1.1.0
pyarrow: 0.7.0
xarray: 0.10.4
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: None
tables: 3.4.3
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: None
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None