Pandas unstack() unexpected behavior with multiindex row and column · Issue #28306 · pandas-dev/pandas (original) (raw)
Code Sample, a copy-pastable example if possible
data = { ('effect_size', 'cohen_d', 'mean'): { ('m1', 'P3', '222'): 0.52, ('m1', 'A5', '111'): -0.07, ('m2', 'P3', '222'): -0.53, ('m2', 'A5', '111'): 0.05, }, ('wilcoxon', 'z_score', 'stouffer'): { ('m1', 'P3', '222'): 2.2, ('m1', 'A5', '111'): -0.92, ('m2', 'P3', '222'): -2.0, ('m2', 'A5', '111'): -0.52, } } df = pd.DataFrame(data) df.index.rename(['metric', 'bar', 'foo'], inplace=True) df.unstack(['foo', 'bar'])
Problem description
The df
looks like this before unstacking:
effect_size wilcoxon
cohen_d z_score
mean stouffer
metric bar foo
m1 A5 111 -0.07 -0.92
P3 222 0.52 2.20
m2 A5 111 0.05 -0.52
P3 222 -0.53 -2.00
by unstacking bar
and foo
, I had expected to see them as column indices, but that's not what happens. Instead foo
and metric
are unstacked, and bar
is left stacked as a row index:
> df.unstack(['foo', 'bar'])
effect_size wilcoxon
cohen_d z_score
mean stouffer
foo 111 222 111 222
metric m1 m2 m1 m2 m1 m2 m1 m2
bar
A5 -0.07 0.05 NaN NaN -0.92 -0.52 NaN NaN
P3 NaN NaN 0.52 -0.53 NaN NaN 2.2 -2.0
I got around the problem by doing the following, but I think the above behavior might be a bug.
Here's my workaround:
> print df.stack([0, 1, 2]).unstack(0).transpose()
bar A5 P3
foo 111 222
effect_size wilcoxon effect_size wilcoxon
cohen_d z_score cohen_d z_score
mean stouffer mean stouffer
metric
m1 -0.07 -0.92 0.52 2.2
m2 0.05 -0.52 -0.53 -2.0
Output of pd.show_versions()
[paste the output of pd.show_versions()
here below this line]
INSTALLED VERSIONS
commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Linux
OS-release: 4.19.37-5+deb10u1rodete2-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.24.1
pytest: None
pip: None
setuptools: unknown
Cython: None
numpy: 1.16.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 2.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.8.0
pytz: 2019.2
blosc: None
bottleneck: None
tables: 3.5.2
numexpr: 2.6.10dev0
feather: None
matplotlib: 1.5.2
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: 0+unknown
pandas_datareader: None
gcsfs: None