Timezone dropped when dataframes are joined after resampling · Issue #13783 · pandas-dev/pandas (original) (raw)
I've already explained the problem link.
When I merge 2 dataframes, using join() or concat() (I didn't try with merge) and one of these has been previously resampled, averaging the values each 15 minutes, i loose the timezone
Code Sample, a copy-pastable example if possible
df1
2016-07-05 11:30:00+01:00 -100.81
2016-07-05 11:31:00+01:00 -99.34
2016-07-05 11:32:00+01:00 -95.09
..............
..............
2016-07-05 11:45:00+01:00 -83.62
2016-07-05 11:46:00+01:00 -1.57
2016-07-05 11:47:00+01:00 21.01
..............
df2['column1']
2016-07-05 11:30:00+01:00 -79,45
2016-07-05 11:45:00+01:00 -51.11
2016-07-05 12:00:00+01:00 -12.67
2016-07-05 12:15:00+01:00 15.21
..........
print df1.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:31:00+01:00',
'2016-07-05 11:32:00+01:00', '2016-07-05 11:33:00+01:00',
...
'2016-07-19 14:30:00+01:00', '2016-07-19 14:31:00+01:00'],
dtype='datetime64[ns, Europe/London]', length=1358, freq=None)
print df2.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
'2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
...
'2016-07-19 14:30:00+01:00', '2016-07-19 14:45:00+01:00'],
dtype='datetime64[ns, Europe/London]', length=1358, freq=None)
df1 = df1.resample('15Min').mean()
print df1.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
'2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
...
'2016-07-19 14:30:00+01:00'],
dtype='datetime64[ns, Europe/London]', length=1358, freq='15T')
df = pd.concat([df1, df2], axis=1)
print df.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
'2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
...
'2016-07-19 14:30:00+01:00'],
dtype='datetime64[ns, UTC]', length=1358, freq='15T')
Expected Output
print df.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
'2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
...
'2016-07-19 14:30:00+01:00'],
dtype='datetime64[ns, Europe/London]', length=1358, freq='15T')
The output looses the timezone and it comes back to 'UTC'.
I think the problem is related to this issue, where the same problem comes out when aggregating.
output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-28-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 24.0.1
Cython: None
numpy: 1.11.1
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 4.2.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None