Timezone lost on DataFrame assignments with realignment · Issue #12981 · pandas-dev/pandas (original) (raw)

Starting from pandas 0.17, certain assignments to DataFrames cause offset-aware datetime columns to be converted to offset-naive columns. Specifically, it seems that if any data realignment is required when assigning the RHS to a a slice of the DataFrame, then timezone info is lost. Here's an example:

from future import print_function import pandas

print("Pandas version:", pandas.version)

start = pandas.Timestamp('2015-01-01', tz='utc') df = pandas.DataFrame({'dates': pandas.date_range(start, periods=3)})

print("Before assignment") print(df['dates'])

Shuffle column and reassign, causing RHS to need to be realigned on assignment

df['dates'] = df['dates'][[1,0,2]]

print("\nAfter assignment") print(df['dates'])

The output I'd expect, which is what I get from pandas 0.16.2, is:

Pandas version: 0.16.2
Before assignment
0    2015-01-01 00:00:00+00:00
1    2015-01-02 00:00:00+00:00
2    2015-01-03 00:00:00+00:00
Name: dates, dtype: object

After assignment
0    2015-01-01 00:00:00+00:00
1    2015-01-02 00:00:00+00:00
2    2015-01-03 00:00:00+00:00
Name: dates, dtype: object

However when I run this with pandas 0.18.0, after the assignment the timezone info is lost:

Pandas version: 0.18.0
Before assignment
0   2015-01-01 00:00:00+00:00
1   2015-01-02 00:00:00+00:00
2   2015-01-03 00:00:00+00:00
Name: dates, dtype: datetime64[ns, UTC]

After assignment
0   2015-01-01
1   2015-01-02
2   2015-01-03
Name: dates, dtype: datetime64[ns]

It seems the custom timezone-aware dtype that pandas started using for timezone-aware time series in 0.17.x doesn't get correctly propagated in this operation.

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.9.0
Cython: None
numpy: 1.11.0
scipy: 0.15.1
statsmodels: None
xarray: None
IPython: 3.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.3
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: None
boto: None