Timezone lost on DataFrame assignments with realignment · Issue #12981 · pandas-dev/pandas (original) (raw)
Starting from pandas 0.17, certain assignments to DataFrames cause offset-aware datetime columns to be converted to offset-naive columns. Specifically, it seems that if any data realignment is required when assigning the RHS to a a slice of the DataFrame, then timezone info is lost. Here's an example:
from future import print_function import pandas
print("Pandas version:", pandas.version)
start = pandas.Timestamp('2015-01-01', tz='utc') df = pandas.DataFrame({'dates': pandas.date_range(start, periods=3)})
print("Before assignment") print(df['dates'])
Shuffle column and reassign, causing RHS to need to be realigned on assignment
df['dates'] = df['dates'][[1,0,2]]
print("\nAfter assignment") print(df['dates'])
The output I'd expect, which is what I get from pandas 0.16.2, is:
Pandas version: 0.16.2
Before assignment
0 2015-01-01 00:00:00+00:00
1 2015-01-02 00:00:00+00:00
2 2015-01-03 00:00:00+00:00
Name: dates, dtype: object
After assignment
0 2015-01-01 00:00:00+00:00
1 2015-01-02 00:00:00+00:00
2 2015-01-03 00:00:00+00:00
Name: dates, dtype: object
However when I run this with pandas 0.18.0, after the assignment the timezone info is lost:
Pandas version: 0.18.0
Before assignment
0 2015-01-01 00:00:00+00:00
1 2015-01-02 00:00:00+00:00
2 2015-01-03 00:00:00+00:00
Name: dates, dtype: datetime64[ns, UTC]
After assignment
0 2015-01-01
1 2015-01-02
2 2015-01-03
Name: dates, dtype: datetime64[ns]
It seems the custom timezone-aware dtype that pandas started using for timezone-aware time series in 0.17.x doesn't get correctly propagated in this operation.
output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.9.final.0
python-bits: 64
OS: Darwin
OS-release: 15.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
pandas: 0.18.0
nose: 1.3.7
pip: 8.1.1
setuptools: 20.9.0
Cython: None
numpy: 1.11.0
scipy: 0.15.1
statsmodels: None
xarray: None
IPython: 3.1.0
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.3
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.5.2
matplotlib: 1.4.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: None
boto: None