Timezone dropped when dataframes are joined after resampling · Issue #13783 · pandas-dev/pandas (original) (raw)

I've already explained the problem link.
When I merge 2 dataframes, using join() or concat() (I didn't try with merge) and one of these has been previously resampled, averaging the values each 15 minutes, i loose the timezone

Code Sample, a copy-pastable example if possible

df1
2016-07-05 11:30:00+01:00    -100.81
2016-07-05 11:31:00+01:00    -99.34
2016-07-05 11:32:00+01:00    -95.09
..............
..............
2016-07-05 11:45:00+01:00    -83.62
2016-07-05 11:46:00+01:00    -1.57
2016-07-05 11:47:00+01:00    21.01
..............
df2['column1']
2016-07-05 11:30:00+01:00    -79,45
2016-07-05 11:45:00+01:00    -51.11
2016-07-05 12:00:00+01:00    -12.67
2016-07-05 12:15:00+01:00    15.21
..........
print df1.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:31:00+01:00',
               '2016-07-05 11:32:00+01:00', '2016-07-05 11:33:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00', '2016-07-19 14:31:00+01:00'],
               dtype='datetime64[ns, Europe/London]', length=1358, freq=None)

print df2.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
               '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00', '2016-07-19 14:45:00+01:00'],
               dtype='datetime64[ns, Europe/London]', length=1358, freq=None)

df1 = df1.resample('15Min').mean()

print df1.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
           '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
            ...
           '2016-07-19 14:30:00+01:00'],
           dtype='datetime64[ns, Europe/London]', length=1358, freq='15T')

df = pd.concat([df1, df2], axis=1)

print df.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
               '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00'],
               dtype='datetime64[ns, UTC]', length=1358, freq='15T')

Expected Output

print df.index
DatetimeIndex(['2016-07-05 11:30:00+01:00', '2016-07-05 11:45:00+01:00',
               '2016-07-05 12:00:00+01:00', '2016-07-05 12:15:00+01:00',
                ...
               '2016-07-19 14:30:00+01:00'],
               dtype='datetime64[ns, Europe/London]', length=1358, freq='15T')

The output looses the timezone and it comes back to 'UTC'.
I think the problem is related to this issue, where the same problem comes out when aggregating.

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-28-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.2
setuptools: 24.0.1
Cython: None
numpy: 1.11.1
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 4.2.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None