BUG: isin() with missing values does not work in 1.3.0 with extension dtypes (original) (raw)
This bug is not present in Pandas < 1.3.0.
In 1.3.0, calling Series.isin() will fail if
- the
Seriesdtypeis an extension dtype (pd.Float64Dtype(),pd.Int64Dtype(), ...) - the
Seriescontains any 'missing' values (numpy.nan,pd.na)
The following code snippet tests a few dtypes, determining if each of them supports isin with missing values:
import pandas as pd import numpy as np
for dtype in (float, int, pd.Float64Dtype(), pd.Int64Dtype(), object):
x = pd.Series([0, 1, 2, 3, 4], dtype=dtype)
options = [1, 2, 3]
print(f"\nTesting with dtype = {x.dtype}:")
x.isin(options) # This works everytime - no missing values
x.iloc[1] = np.nan # Set a value to NA
try:
x.isin(options) # This no longer works
except Exception as err:
print(f"Error! {err}")
else:
print("OK")Now, show the actual stack trace
print("\nStacktrace for dtype=Int64") dtype = pd.Int64Dtype() x = pd.Series([0, 1, 2, 3, 4], dtype=dtype) options = [1, 2, 3] x.iloc[1] = np.nan # Set a value to NA x.isin(options)
The output is:
Testing with dtype = float64:
OK
Testing with dtype = int64:
OK
Testing with dtype = Float64:
Error! boolean value of NA is ambiguous
Testing with dtype = Int64:
Error! boolean value of NA is ambiguous
Testing with dtype = object:
OK
Stacktrace for dtype=Int64
Traceback (most recent call last):
File "...dev/pd_1_3_isin_bug.py", line 31, in <module>
x.isin(options)
File "..._dev_venv/lib/python3.7/site-packages/pandas/core/series.py", line 5024, in isin
result = algorithms.isin(self._values, values)
File "..._dev_venv/lib/python3.7/site-packages/pandas/core/algorithms.py", line 475, in isin
return comps.isin(values)
File "..._dev_venv/lib/python3.7/site-packages/pandas/core/arrays/masked.py", line 408, in isin
if libmissing.NA in values:
File "pandas/_libs/missing.pyx", line 446, in pandas._libs.missing.NAType.__bool__
TypeError: boolean value of NA is ambiguous
pd.show_versions() INSTALLED VERSIONS
commit : f00ed8f47020034e752baf0250483053340971b0 python : 3.7.4.final.0 python-bits : 64 OS : Linux OS-release : 3.10.0-1127.13.1.el7.x86_64 Version : #1 SMP Fri Jun 12 14:34:17 EDT 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8
pandas : 1.3.0 numpy : 1.20.3 pytz : 2019.3 dateutil : 2.8.0 pip : 21.0.1 setuptools : 40.8.0 Cython : 0.29.13 pytest : 5.1.1 hypothesis : None sphinx : 4.0.2 blosc : None feather : None xlsxwriter : 1.2.1 lxml.etree : 4.4.1 html5lib : 1.0.1 pymysql : None psycopg2 : None jinja2 : 2.10.3 IPython : 7.8.0 pandas_datareader: None bs4 : 4.8.0 bottleneck : 1.2.1 fsspec : 0.5.2 fastparquet : None gcsfs : None matplotlib : 3.4.2 numexpr : 2.7.0 odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : 0.13.0 pyxlsb : None s3fs : None scipy : 1.5.4 sqlalchemy : 1.3.9 tables : 3.5.2 tabulate : None xarray : None xlrd : 2.0.1 xlwt : 1.3.0 numba : 0.45.1