BUG: coercion of mixed dt-aware data in Series constructor · Issue #13051 · pandas-dev/pandas (original) (raw)
I ran into an issue where the behavior of .apply()
changed from 0.16 to 0.17, causing different results on tz aware data. Extracting the hour of day from datetimes is different for Series(x).apply(func)
vs func(x)
. Below is a minimal example of the issue in 0.17, it seems the behavior is the same in 0.18 but different (though still not equal) on master, also shown below.
On 0.17.1 and 0.18.0:
import pandas as pd def hour_of_day(dt): ... return dt.hour ... dt = pd.to_datetime(1462068217, unit='s') dt_localized = dt.tz_localize('UTC').tz_convert('US/Pacific') dt_list = [dt, dt_localized] apply_series = pd.Series(dt_list).apply(hour_of_day) map_series = pd.Series(map(hour_of_day, dt_list)) print dt_list [Timestamp('2016-05-01 02:03:37'), Timestamp('2016-04-30 19:03:37-0700', tz='US/Pacific')] print apply_series 0 2 1 2 dtype: int64 print map_series 0 2 1 19 dtype: int64 print apply_series - map_series 0 0 1 -17 dtype: int64 pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.17.1 nose: 1.3.7 pip: 8.1.1 setuptools: 20.7.0 Cython: 0.24 numpy: 1.11.0 scipy: 0.17.0 statsmodels: None IPython: 4.2.0 sphinx: 1.4.1 patsy: 0.4.1 dateutil: 2.5.2 pytz: 2016.3 blosc: None bottleneck: 1.0.0 tables: 3.2.2 numexpr: 2.5.2 matplotlib: 1.5.1 openpyxl: 2.3.2 xlrd: 0.9.4 xlwt: 1.0.0 xlsxwriter: 0.8.5 lxml: 3.6.0 bs4: 4.3.2 html5lib: 0.999 httplib2: None apiclient: None sqlalchemy: 1.0.12 pymysql: 0.6.7.None psycopg2: None Jinja2: None
On 0.16.2 (what I expected):
Same setup as above
print apply_series 0 2 1 19 dtype: int64 print map_series 0 2 1 19 dtype: int64 print apply_series - map_series 0 0 1 0 dtype: int64 pd.show_versions()
INSTALLED VERSIONS
commit: None python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.16.2 nose: None Cython: None numpy: 1.10.4 scipy: None statsmodels: None IPython: 4.2.0 sphinx: None patsy: None dateutil: 2.5.2 pytz: 2016.3 bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None
On master:
Same setup as above
print apply_series 0 19 1 19 dtype: int32 print map_series 0 2 1 19 dtype: int64 print apply_series - map_series 0 17 1 0 dtype: int64 pd.show_versions()
INSTALLED VERSIONS
commit: 05e734ab171be0fda838c6b12839c38fa588da2c python: 2.7.11.final.0 python-bits: 64 OS: Darwin OS-release: 15.2.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8
pandas: 0.18.0+203.g05e734a nose: None pip: 8.1.1 setuptools: 20.7.0 Cython: None numpy: 1.10.4 scipy: None statsmodels: None xarray: None IPython: 4.2.0 sphinx: None patsy: None dateutil: 2.5.2 pytz: 2016.3 blosc: None bottleneck: None tables: None numexpr: None matplotlib: None openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: None httplib2: None apiclient: None sqlalchemy: None pymysql: None psycopg2: None jinja2: None boto: None pandas_datareader: None
Expected Output
I would expect the output to be [2, 19], as in 0.16, and matching map(f, data).