DataFrame.apply with axis=1 returning (also erroring) different results when returning a list · Issue #17970 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
df = pd.DataFrame(data=np.random.randint(0, 5, (5,3)), columns=['a', 'b', 'c']) df a b c 0 4 0 0 1 2 0 1 2 2 2 2 3 1 2 2 4 3 0 0
df.apply(lambda x: list(range(2)), axis=1) # returns a Series 0 [0, 1] 1 [0, 1] 2 [0, 1] 3 [0, 1] 4 [0, 1] dtype: object
df.apply(lambda x: list(range(3)), axis=1) # returns a DataFrame a b c 0 0 1 2 1 0 1 2 2 0 1 2 3 0 1 2 4 0 1 2
i = 0 def f(x): global i if i == 0: i += 1 return list(range(3)) return list(range(4))
df.apply(f, axis=1) ValueError: Shape of passed values is (5, 4), indices imply (5, 3)
Problem description
There are three possible outcomes. When the length of the returned list is equal to the number of columns then a DataFrame is returned and each column gets the corresponding value in the list.
If the length of the returned list is not equal to the number of columns, then a Series of lists is returned.
If the length of the returned list equals the number of columns for the first row but has at least one row where the list has a different number of elements than number of columns a ValueError is raised.
Expected Output
Need consistency. Probably should default to a Series of lists for all examples.
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.21.0rc1
pytest: 3.0.7
pip: 9.0.1
setuptools: 35.0.2
Cython: 0.25.2
numpy: 1.13.3
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.0.0
sphinx: 1.5.5
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.1.0
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.5.0