REGR: Assigning multiple new columns with loc fails when index is a MultiIndex · Issue #39147 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
rows = [(1,2), (3,4)] initial_cols = ["a", "b"]
df = pd.DataFrame(42, index=pd.MultiIndex.from_tuples(rows), columns=initial_cols)
new_cols = ["c", "d"] df.loc[:, new_cols] = None
Problem description
When running the above code in pandas 1.1.5, you will get the following error:
File "foo.py", line 11, in <module>
df.loc[:, new_cols] = None
File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 666, in __setitem__
indexer = self._get_setitem_indexer(key)
File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 609, in _get_setitem_indexer
return self._convert_tuple(key, is_setter=True)
File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 734, in _convert_tuple
idx = self._convert_to_indexer(k, axis=i, is_setter=is_setter)
File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 1198, in _convert_to_indexer
return self._get_listlike_indexer(key, axis, raise_missing=True)[1]
File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 1254, in _get_listlike_indexer
self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
File "/home/guido/anaconda3/envs/pandas_115_bug/lib/python3.7/site-packages/pandas/core/indexing.py", line 1298, in _validate_read_indexer
raise KeyError(f"None of [{key}] are in the [{axis_name}]")
KeyError: "None of [Index(['c', 'd'], dtype='object')] are in the [columns]"
Till version 1.1.4, the above code would result in the dataframe being extended with the two new columns c and d. With the release of pandas 1.1.5, the code results in the above error. It might be related to the fix for #37711
Single column assignments do still work, i.e. replacing the single assignment line with the following two separate assignment lines does work:
df.loc[:, "c"] = None df.loc[:, "d"] = None
The problem also only occurs whenever the index is a multiindex. If the index is not a multiindex, the error will not occur.
Expected Output
Expected output is that no error occurs and that the two columns c and d are added with values None for all rows.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : b5958ee
python : 3.7.9.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-33-generic
Version : #36-Ubuntu SMP Wed Dec 9 09:14:40 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.1.5
numpy : 1.19.2
pytz : 2020.5
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.1.2.post20210112
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None