Py3 Failed to read HDF5 created in Py2 with datetime timezone · Issue #26443 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
d = {"data": 123} df = pd.DataFrame(d, index=[pd.Timestamp("2019-01-01T18:00").tz_localize('America/New_York')])
df.to_hdf('/tmp/test.h5', key="key") # write in Python 2
df = pd.read_hdf('/tmp/test.h5', key="key") # failed in py3
print(df)
Problem description
When a dataframe contains a column with datetime with timezone, and this dataframe is written as hd5 in Python 2. Python 3 is unable to read it. The error is
...
~/miniconda3/envs/SOMEENV/lib/python3.7/site-packages/pandas/core/indexes/datetimelike.py in _format_with_header(self, header, **kwargs)
431
432 def _format_with_header(self, header, **kwargs):
--> 433 return header + list(self._format_native_types(**kwargs))
434
435 @property
~/miniconda3/envs/SOMEENV/lib/python3.7/site-packages/pandas/core/indexes/datetimes.py in _format_native_types(self, na_rep, date_format, **kwargs)
450 tz=self.tz,
451 format=fmt,
--> 452 na_rep=na_rep)
453
454 @property
pandas/_libs/tslib.pyx in pandas._libs.tslib.format_array_from_datetime()
pandas/_libs/tslibs/timestamps.pyx in pandas._libs.tslibs.timestamps.Timestamp.__new__()
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.convert_to_tsobject()
TypeError: Cannot convert numpy.bytes_ to datetime.tzinfo
Expected Output
Python 3 should be able to read the dataframe written by Python 2
Output of pd.show_versions()
Python 3
INSTALLED VERSIONS
------------------
commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.2
pytest: 4.4.2
pip: 19.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: 1.8.1
bottleneck: None
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: 2.8.2 (dt dec pq3 ext lo64)
jinja2: 2.10.1
s3fs: 0.2.1
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None
Python 2
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-957.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.24.2
pytest: 4.0.2
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.0
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 5.8.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: 1.5.1
bottleneck: None
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: 2.7.5 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: 0.2.0
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None