DataFrame.iterrows() breaks timezone on index · Issue #8951 · pandas-dev/pandas (original) (raw)

Duplicate of #8890.

As far as I can tell, the Timestamps for the index generated by iterrows() are 5 hours behind where they should be in this example:

In [33]: idx = pd.date_range("2010-01-01 00:00:00-0500", freq='D', periods=3)

In [34]: df = pd.DataFrame([1,2,3], index=[idx])

In [35]: df # this looks correct Out[35]: 0 2010-01-01 00:00:00-05:00 1 2010-01-02 00:00:00-05:00 2 2010-01-03 00:00:00-05:00 3

In [36]: [index for index, row in df.iterrows()] # but this looks wrong: Out[36]: [Timestamp('2009-12-31 19:00:00-0500', tz='pytz.FixedOffset(-300)', offset='D'), Timestamp('2010-01-01 19:00:00-0500', tz='pytz.FixedOffset(-300)', offset='D'), Timestamp('2010-01-02 19:00:00-0500', tz='pytz.FixedOffset(-300)', offset='D')]

I would have expected iterrows() to produce the same indices as this code:

In [38]: [df.index[i] for i in range(len(df))] Out[38]: [Timestamp('2010-01-01 00:00:00-0500', tz='pytz.FixedOffset(-300)', offset='D'), Timestamp('2010-01-02 00:00:00-0500', tz='pytz.FixedOffset(-300)', offset='D'), Timestamp('2010-01-03 00:00:00-0500', tz='pytz.FixedOffset(-300)', offset='D')]

The row.name is also incorrect:

In [37]: [row.name for index, row in df.iterrows()] Out[37]: [Timestamp('2009-12-31 19:00:00-0500', tz='pytz.FixedOffset(-300)', offset='D'), Timestamp('2010-01-01 19:00:00-0500', tz='pytz.FixedOffset(-300)', offset='D'), Timestamp('2010-01-02 19:00:00-0500', tz='pytz.FixedOffset(-300)', offset='D')]

But all is fine if we use a geographical timezone instead of a pytz.FixedOffset:

In [47]: idx = pd.date_range("2010-01-01 00:00:00", freq='D', periods=3, tz="America/New_York")

In [48]: df = pd.DataFrame([1,2,3], index=[idx])

In [49]: [index for index, row in df.iterrows()] Out[49]: [Timestamp('2010-01-01 00:00:00-0456', tz='America/New_York', offset='D'), Timestamp('2010-01-02 00:00:00-0456', tz='America/New_York', offset='D'), Timestamp('2010-01-03 00:00:00-0456', tz='America/New_York', offset='D')]

In [50]: df Out[50]: 0 2010-01-01 00:00:00-04:56 1 2010-01-02 00:00:00-04:56 2 2010-01-03 00:00:00-04:56 3

Forgive me if I am using Pandas incorrectly!

Versions:


In [51]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.8.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-25-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8

pandas: 0.15.1
nose: 1.3.4
Cython: 0.21.1
numpy: 1.8.2
scipy: 0.14.0
statsmodels: None
IPython: 2.3.1
sphinx: 1.2.3
patsy: None
dateutil: 2.2
pytz: 2014.10
bottleneck: 0.6.0
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.2
openpyxl: 1.8.6
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.3.6
bs4: None
html5lib: 0.999
httplib2: 0.9
apiclient: None
rpy2: 2.3.8
sqlalchemy: None
pymysql: None
psycopg2: None

(it goes without saying that I'm a huge fan of Pandas!)