Pandas unstack() unexpected behavior with multiindex row and column · Issue #28306 · pandas-dev/pandas (original) (raw)

Code Sample, a copy-pastable example if possible

data = { ('effect_size', 'cohen_d', 'mean'): { ('m1', 'P3', '222'): 0.52, ('m1', 'A5', '111'): -0.07, ('m2', 'P3', '222'): -0.53, ('m2', 'A5', '111'): 0.05, }, ('wilcoxon', 'z_score', 'stouffer'): { ('m1', 'P3', '222'): 2.2, ('m1', 'A5', '111'): -0.92, ('m2', 'P3', '222'): -2.0, ('m2', 'A5', '111'): -0.52, } } df = pd.DataFrame(data) df.index.rename(['metric', 'bar', 'foo'], inplace=True) df.unstack(['foo', 'bar'])

Problem description

The df looks like this before unstacking:

               effect_size wilcoxon
                   cohen_d  z_score
                      mean stouffer
metric bar foo                     
m1     A5  111       -0.07    -0.92
       P3  222        0.52     2.20
m2     A5  111        0.05    -0.52
       P3  222       -0.53    -2.00

by unstacking bar and foo, I had expected to see them as column indices, but that's not what happens. Instead foo and metric are unstacked, and bar is left stacked as a row index:

> df.unstack(['foo', 'bar'])

       effect_size                   wilcoxon                
           cohen_d                    z_score                
              mean                   stouffer                
foo            111         222            111        222     
metric          m1    m2    m1    m2       m1    m2   m1   m2
bar                                                          
A5           -0.07  0.05   NaN   NaN    -0.92 -0.52  NaN  NaN
P3             NaN   NaN  0.52 -0.53      NaN   NaN  2.2 -2.0

I got around the problem by doing the following, but I think the above behavior might be a bug.

Here's my workaround:

> print df.stack([0, 1, 2]).unstack(0).transpose()

bar             A5                   P3         
foo            111                  222         
       effect_size wilcoxon effect_size wilcoxon
           cohen_d  z_score     cohen_d  z_score
              mean stouffer        mean stouffer
metric                                          
m1           -0.07    -0.92        0.52      2.2
m2            0.05    -0.52       -0.53     -2.0

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 2.7.15.final.0
python-bits: 64
OS: Linux
OS-release: 4.19.37-5+deb10u1rodete2-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.24.1
pytest: None
pip: None
setuptools: unknown
Cython: None
numpy: 1.16.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 2.0.0
sphinx: None
patsy: 0.4.1
dateutil: 2.8.0
pytz: 2019.2
blosc: None
bottleneck: None
tables: 3.5.2
numexpr: 2.6.10dev0
feather: None
matplotlib: 1.5.2
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: 0+unknown
pandas_datareader: None
gcsfs: None