BUG: melt MultiIndex columns using index columns as identifier variables · Issue #34129 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd df = pd.DataFrame( [['p', 'q', 'r'], ['s', 't', 'u'], ['v', 'w', 'x'], ], index=pd.MultiIndex.from_arrays( [list('123'), list('456')], names=['ind1', 'ind2'] ), columns=pd.MultiIndex.from_arrays( [list('ABC'), list('DEF')]) )
melt using level1 (or above in other cases) fails
df_l1 = df.reset_index(col_level=1) print("\nL1 index insert:\n", df_l1)
NOTE: THIS FAILS!
df_l1 = pd.melt(df_l1, col_level=1, id_vars=['ind1'], value_vars=['D','E']) print("\nL1 melt:\n", df_l1)
Problem description
Suppose that we have multi-index columns and we would like to melt
, using the index as the id_vars
:
In this example, if we reset_index(col_level=1)
and then melt()
will fail as shown below:
L1 index insert: A B C ind1 ind2 D E F 0 1 4 p q r 1 2 5 s t u 2 3 6 v w x
FAILS: KeyError: 'ind1'
KeyError Traceback (most recent call last) ~/Repos/spec17/venv/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2896 try: -> 2897 return self._engine.get_loc(key) 2898 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'ind1'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last) in 17 print("\nL1 index insert:\n", df_l1) 18 # NOTE: THIS FAILS! ---> 19 df_l1 = pd.melt(df_l1, col_level=1, 20 id_vars=['ind1'], value_vars=['D','E']) 21 print("\nL1 melt:\n", df_l1)
~/Repos/spec17/venv/lib/python3.8/site-packages/pandas/core/reshape/melt.py in melt(frame, id_vars, value_vars, var_name, value_name, col_level) 102 mdata = {} 103 for col in id_vars: --> 104 id_data = frame.pop(col) 105 if is_extension_type(id_data): 106 id_data = concat([id_data] * K, ignore_index=True)
~/Repos/spec17/venv/lib/python3.8/site-packages/pandas/core/generic.py in pop(self, item) 860 3 monkey NaN 861 """ --> 862 result = self[item] 863 del self[item] 864 try:
~/Repos/spec17/venv/lib/python3.8/site-packages/pandas/core/frame.py in getitem(self, key) 2993 if self.columns.nlevels > 1: 2994 return self._getitem_multilevel(key) -> 2995 indexer = self.columns.get_loc(key) 2996 if is_integer(indexer): 2997 indexer = [indexer]
~/Repos/spec17/venv/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 2897 return self._engine.get_loc(key) 2898 except KeyError: -> 2899 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2900 indexer = self.get_indexer([key], method=method, tolerance=tolerance) 2901 if indexer.ndim > 1 or indexer.size > 1:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'ind1'
Output of pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.8.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-51-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8
pandas : 1.0.3
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1
setuptools : 40.8.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 7.14.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None