Categorical NaN behaviour different from a str · Issue #32276 · pandas-dev/pandas (original) (raw)

Code Sample

Series as category

df = pd.Series(['a','a','b','c']).astype('category')
print(df.shift(1))
print(df)
print(df.shift(1) != df)

OUTPUT:

0 NaN
1 a
2 a
3 b
dtype: category
Categories (3, object): [a, b, c]
0 a
1 a
2 b
3 c
dtype: category
Categories (3, object): [a, b, c]
0 False
1 False
2 True
3 True
dtype: bool

Series as str

df = pd.Series(['a','a','b','c']).astype('str')
print(df.shift(1))
print(df)
print(df.shift(1) != df)

OUTPUT:

0 NaN
1 a
2 a
3 b
dtype: object
0 a
1 a
2 b
3 c
dtype: object
0 True
1 False
2 True
3 True
dtype: bool

#### Problem description

The behaviour of NaN in comparison operators is different for type category and str. See example code - the first element is NaN in both instances, but the second instance equates to false, and the first equates to true for a != operation. For a == operation for a category, the behavior is as expected.

#### Expected Output

I would expect both to have the same output.

#### Output of ``pd.show_versions()``

<details>

INSTALLED VERSIONS
------------------
commit           : None
python           : 3.8.0.final.0
python-bits      : 64
OS               : Linux
OS-release       : 3.10.0-1062.12.1.el7.x86_64
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_GB.UTF-8
LOCALE           : en_GB.UTF-8

pandas           : 1.0.1
numpy            : 1.18.1
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.0.2
setuptools       : 41.4.0
Cython           : 0.29.15
pytest           : 5.3.5
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.1
IPython          : 7.12.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fastparquet      : None
gcsfs            : None
lxml.etree       : None
matplotlib       : 3.1.3
numexpr          : 2.7.1
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : None
pytables         : None
pytest           : 5.3.5
pyxlsb           : None
s3fs             : None
scipy            : 1.4.1
sqlalchemy       : 1.3.13
tables           : 3.6.1
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
xlsxwriter       : None
numba            : None

</details>