BUG: trying to assign to a dataframe with a boolean index fails with boolean dtype · Issue #47125 · pandas-dev/pandas (original) (raw)

Pandas version checks

Reproducible Example

import pandas as pd df = pd.DataFrame([[-1, 2], [3, -4]], index=list("ab"), columns=["one", "two"]) expected = pd.DataFrame([[0, 2], [3, 0]], index=list("ab"), columns=["one", "two"])

Should be the same as (df < 0) but with boolean dtype

boolean_indexer = pd.DataFrame( {"one": pd.Series([True, False], dtype="boolean", index=list("ab")), "two": pd.Series([False, True], dtype="boolean", index=list("ab"))} )

result = df.copy() print((df < 0).dtypes) result[df < 0] = 0 print(result == expected) one bool two bool dtype: object one two a True True b True True

result2 = df.copy() print(boolean_indexer.dtypes) result2[boolean_indexer] = 0 print(result2 == expected) one boolean two boolean dtype: object Traceback (most recent call last): File "/opt/anaconda3/envs/pandas-dev/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3397, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 3, in <cell line: 3> result2[boolean_indexer] = 0 File "/Users/marius/Documents/GitHub/pandas/pandas/core/frame.py", line 3724, in setitem self._setitem_frame(key, value) File "/Users/marius/Documents/GitHub/pandas/pandas/core/frame.py", line 3841, in _setitem_frame raise TypeError( TypeError: Must pass DataFrame or 2-d ndarray with boolean values only

Issue Description

Trying to assign to a dataframe, indexing with a boolean dataframe, fails if the boolean dataframe has the nullable boolean dtype rather than the bool dtype.

It looks like this happens partly because is_bool_dtype() returns False for the boolean dataframe - either that check should be changed in _setitem_frame, or is_bool_dtype() should return True?

Expected Behavior

I would still expect boolean indexing to work with nullable booleans, when I came across this issue I was not deliberately trying to use the boolean dtype, it was just the output from a logical check I did. If logical operations can give either bool or boolean output, both outputs should behave the same way when used for indexing.

Installed Versions

INSTALLED VERSIONS

commit : 3e3bb90
python : 3.8.13.final.0
python-bits : 64
OS : Darwin
OS-release : 21.4.0
Version : Darwin Kernel Version 21.4.0: Fri Mar 18 00:45:05 PDT 2022; root:xnu-8020.101.4~15/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_AU.UTF-8
pandas : 1.5.0.dev0+819.g3e3bb9028b.dirty
numpy : 1.22.4
pytz : 2022.1
dateutil : 2.8.2
setuptools : 62.3.2
pip : 22.1.1
Cython : 0.29.30
pytest : 7.1.2
hypothesis : 6.46.9
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.3
lxml.etree : 4.8.0
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.3.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : 1.3.4
brotli :
fastparquet : 0.8.1
fsspec : 2021.11.0
gcsfs : 2021.11.0
matplotlib : 3.5.2
numba : 0.53.1
numexpr : 2.8.0
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 8.0.0
pyreadstat : 1.1.6
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.8.1
snappy :
sqlalchemy : 1.4.36
tables : 3.7.0
tabulate : 0.8.9
xarray : 2022.3.0
xlrd : 2.0.1
xlwt : 1.3.0
zstandard : None