BUG: coercion of mixed dt-aware data in Series constructor · Issue #13051 · pandas-dev/pandas (original) (raw)

I ran into an issue where the behavior of .apply() changed from 0.16 to 0.17, causing different results on tz aware data. Extracting the hour of day from datetimes is different for Series(x).apply(func) vs func(x). Below is a minimal example of the issue in 0.17, it seems the behavior is the same in 0.18 but different (though still not equal) on master, also shown below.

On 0.17.1 and 0.18.0:

import pandas as pd def hour_of_day(dt): ... return dt.hour ... dt = pd.to_datetime(1462068217, unit='s') dt_localized = dt.tz_localize('UTC').tz_convert('US/Pacific') dt_list = [dt, dt_localized] apply_series = pd.Series(dt_list).apply(hour_of_day) map_series = pd.Series(map(hour_of_day, dt_list)) print dt_list [Timestamp('2016-05-01 02:03:37'), Timestamp('2016-04-30 19:03:37-0700', tz='US/Pacific')] print apply_series 0 2 1 2 dtype: int64 print map_series 0 2 1 19 dtype: int64 print apply_series - map_series 0 0 1 -17 dtype: int64 pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.17.1 nose: 1.3.7 pip: 8.1.1 setuptools: 20.7.0 Cython: 0.24 numpy: 1.11.0 scipy: 0.17.0 statsmodels: None IPython: 4.2.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.5.2 pytz: 2016.3 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.5.2 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.5 lxml: 3.6.0 bs4: 4.3.2 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: 1.0.12 pymysql: 0.6.7.None psycopg2: None Jinja2: None

On 0.16.2 (what I expected):

Same setup as above

print apply_series 0 2 1 19 dtype: int64 print map_series 0 2 1 19 dtype: int64 print apply_series - map_series 0 0 1 0 dtype: int64 pd.show_versions()

INSTALLED VERSIONS

commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.16.2 nose: None Cython: None numpy: 1.10.4 scipy: None statsmodels: None IPython: 4.2.0 sphinx: None patsy: None dateutil: 2.5.2 pytz: 2016.3 bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None

On master:

Same setup as above

print apply_series 0 19 1 19 dtype: int32 print map_series 0 2 1 19 dtype: int64 print apply_series - map_series 0 17 1 0 dtype: int64 pd.show_versions()

INSTALLED VERSIONS

commit: 05e734ab171be0fda838c6b12839c38fa588da2c python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8

pandas: 0.18.0+203.g05e734a nose: None pip: 8.1.1 setuptools: 20.7.0 Cython: None numpy: 1.10.4 scipy: None statsmodels: None xarray: None IPython: 4.2.0 sphinx: None patsy: None dateutil: 2.5.2 pytz: 2016.3 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None

Expected Output

I would expect the output to be [2, 19], as in 0.16, and matching map(f, data).