Unexpected interaction in DataFrame.apply(f) when f
returns a list · Issue #18919 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
df = pd.DataFrame({'x': pd.Series([['a', 'b'], ['q']]), 'y': pd.Series([['z'], ['q', 't']])}) df.index = pd.MultiIndex.from_tuples([('i0', 'j0'), ('i1', 'j1')])
df.apply(lambda row: [el for el in row['x'] if el in row['y']], axis=1)
Problem description
When a DataFrame
has a MultiIndex
, and the function passed to DataFrame.apply
returns all list
s, weird stuff happens and an unintelligible error occurs.
What ends up happening is that the result somehow gets coerced to a list of arrays (not sure where or why the list->array conversion happens), and then submitted to DataFrame.__init__
, which tries to massage it that into a DataFrame, and fails.
Resulting error: ValueError: Empty data passed with indices specified.
, emitted from deep within the bowels of pandas/core/internals.py
, specifically create_block_manager_from_arrays
.
This happens regardless of what the reduce=
argument is set to.
Expected Output
Don't try to manipulate the output. Return a Series
of list
s.
In the example above, that'd be:
pd.Series([[], ['q']], index=df.index)
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.26.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.1
pytest: None
pip: 9.0.1
setuptools: 36.6.0
Cython: None
numpy: 1.12.1
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None