Strange behaviour when trying to create a series from two columns of a dataframe with apply(tuple, axis=1) · Issue #17348 · pandas-dev/pandas (original) (raw)

Unintended behaviour of pandas happens when one tries to create a series applying
tuple (or list) to two columns of a dataframe, one of which consists of timestamps:

import pandas as pd import numpy as np d = pd.DataFrame({'a': pd.Series(np.random.randn(4)), 'b': ['a', 'list', 'of', 'words'], 'ts': pd.date_range('2016-10-01', periods=4, freq='H')})

a b ts
0 0.200813 a 2016-10-01 00:00:00
1 0.316971 list 2016-10-01 01:00:00
2 -0.186392 of 2016-10-01 02:00:00
3 -0.565593 words 2016-10-01 03:00:00

let's try first with columns 'a'and 'b':

d[['a', 'b']].apply(tuple, axis=1)

0         (0.2008128669491346, a)
1      (0.3169711841447721, list)
2       (-0.1863916899789735, of)
3    (-0.5655926199699992, words)
dtype: object

So far, everything is fine. Now let's do it with 'a' and 'ts':

d[['a', 'ts']].apply(tuple, axis=1)

a ts
0 0.200813 2016-10-01 00:00:00
1 0.316971 2016-10-01 01:00:00
2 -0.186392 2016-10-01 02:00:00
3 -0.565593 2016-10-01 03:00:00

Oops.

It's easy to find a way around this, by coating the timestamps before apply and uncoating after:

def coating(t): return lambda: t

def uncoating(x, f): return x, f()

d['coated_ts'] = d['ts'].apply(coating)

d[['a', 'coated_ts']].apply(tuple, axis=1).apply(lambda t: uncoating(*t))

0     (0.2008128669491346, 2016-10-01 00:00:00)
1     (0.3169711841447721, 2016-10-01 01:00:00)
2    (-0.1863916899789735, 2016-10-01 02:00:00)
3    (-0.5655926199699992, 2016-10-01 03:00:00)
dtype: object

It would be nice if this strange behaviour was corrected.

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None