BUG: pandas 1.1.0 introduces string comparison bug for larger datasets · Issue #35700 · pandas-dev/pandas (original) (raw)
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- (optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
import pandas as pd
This works fine
df1 = pd.DataFrame({"a": [1,2,3], "b": ["x","y","z"]}) print(df1 == "x")
Loading in some bigger data from Kaggle https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data
data and code file included in zip to make it easy
df2 = pd.read_csv("AB_NYC_2019.csv")
On pandas v1.1.0 throws a ValueError: unknown type str32
print(df2 == "x")
On pandas v1.1.0 throws a ValueError: unknown type str128
print(df2 == "John")
Problem description
Previously comparing the dataframe to a string value would give a boolean mask as expected. It is now erroring. Confirmed that on v1.0.5 this works as expected, only on 1.1.0 does this error.
Expected Output
A boolean mask of true or false values with the same shape as the dataframe.
Output of pd.show_versions()
INSTALLED VERSIONS
commit : d9fff27
python : 3.7.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None
pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 49.3.1.post20200810
Cython : 0.29.21
pytest : 6.0.1
hypothesis : 5.24.0
sphinx : 3.2.0
blosc : None
feather : 0.4.0
xlsxwriter : None
lxml.etree : 4.4.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.17.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.1.3
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.0
xlrd : 1.2.0
xlwt : None
numba : None