BUG: select_column not preserving a UTC timezone · Issue #7777 · pandas-dev/pandas (original) (raw)

I was having issues with lost tz-info when retrieving a DatetimeIndex from an HDF store using store.select_column('data', 'index'). I was able to track down the issue to tseries/index.py in the Index._to_embed method. The issue is

def _to_embed(self, keep_tz=False):
    """ return an array repr of this object, potentially casting to object """
    if keep_tz and self.tz is not None and str(self.tz) != 'UTC':                                                                                                                                                                       
         return self.asobject.values
    return self.values

It looks like it explicitly rejects UTC timezones. Is there a good reason for this?

The below code reproduces the problem for me.

import pandas as pd

drange = pd.date_range('2014-07-07 00:00:00', '2014-07-07 03:00:00', freq='1h') drange_utc = drange.tz_localize('UTC') drange_mst = drange.tz_localize('MST')

print drange._to_embed(keep_tz=True) print drange_utc._to_embed(keep_tz=True) print drange_mst._to_embed(keep_tz=True)

I'm using python 2.7.6 with the following packages:

commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.32-431.17.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.utf8

pandas: 0.14.1
nose: 1.3.3
Cython: 0.20.1
numpy: 1.8.1
scipy: 0.14.0
statsmodels: 0.5.0
IPython: 2.0.0
sphinx: 1.2.2
patsy: 0.2.1
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.2
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.3.1
openpyxl: 1.8.6
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.3.5
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: 0.9.4
pymysql: None
psycopg2: None