pivot function on timezone aware objects does not preserve timezone info in resulting dataframe index · Issue #5878 · pandas-dev/pandas (original) (raw)

This bug is in 0.12.0

Using example DataFrame like below:

   col  data                       time
0    1     0  2013-03-22 11:00:00-04:00
1    2     1  2013-03-22 15:00:00-04:00
2    2     2  2013-03-22 11:00:00-04:00
3    1     3  2013-03-22 15:00:00-04:00

After pivoting, the old behavior in 0.10.1 properly preserved the timezone info in the index, resulting in a new DataFrame like such:

col                        1  2
time                           
2013-03-22 11:00:00-04:00  0  2
2013-03-22 15:00:00-04:00  3  1

However in 0.12.0 this behavior is lost resulting in an index that does not have the timezone information

col                  1  2
time                     
2013-03-22 15:00:00  0  2
2013-03-22 19:00:00  3  1

Below is the code to reproduce this issue:

import pandas print pandas.version import datetime import pandas as pn import pytz est = pytz.timezone('US/Eastern') dt1 = est.localize(datetime.datetime(2013,3,22,11,0,0)) dt2 = est.localize(datetime.datetime(2013,3,22,15,0,0)) df = pn.DataFrame({'time': [dt1, dt2, dt1, dt2], 'col': [1, 2, 2, 1], 'data': range(4)}) pivotDf = df.pivot('time', 'col', 'data') print df print pivotDf print pivotDf.index

the output from 0.10.1 is:

0.10.1
   col  data                       time
0    1     0  2013-03-22 11:00:00-04:00
1    2     1  2013-03-22 15:00:00-04:00
2    2     2  2013-03-22 11:00:00-04:00
3    1     3  2013-03-22 15:00:00-04:00
col                        1  2
time                           
2013-03-22 11:00:00-04:00  0  2
2013-03-22 15:00:00-04:00  3  1
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-03-22 11:00:00, 2013-03-22 15:00:00]
Length: 2, Freq: None, Timezone: US/Eastern

the output from 0.12.0 is:

0.12.0
   col  data                       time
0    1     0  2013-03-22 11:00:00-04:00
1    2     1  2013-03-22 15:00:00-04:00
2    2     2  2013-03-22 11:00:00-04:00
3    1     3  2013-03-22 15:00:00-04:00
col                  1  2
time                     
2013-03-22 15:00:00  0  2
2013-03-22 19:00:00  3  1
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-03-22 15:00:00, 2013-03-22 19:00:00]
Length: 2, Freq: None, Timezone: None

Notice the None in the "Timezone: " infor of the DatetimeIndex.