BUG: Issues with return type hint and/or definition of pandas.core.generic.NDFrame.__hash__ · Issue #40013 · pandas-dev/pandas (original) (raw)


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd pd.DataFrame().hash()

Problem description

It is already known that calling NDFrame.__hash__ always raises an exception, and thus NDFrame objects are not hashable. My issue is with the return type hint, which is currently int:

def __hash__(self) -> int:
raise TypeError(
f"{repr(type(self).__name__)} objects are mutable, "
f"thus they cannot be hashed"
)

I believe that the correct return type hint should be typing.NoReturn, according to the example given at https://docs.python.org/3/library/typing.html#typing.NoReturn.

From a static type checker point-of-view (e.g. mypy), it may be even better to override the attribute as __hash__ = None, as this what native Python expects for unhashable objects (see https://github.com/python/cpython/blob/1f433406bd46fbd00b88223ad64daea6bc9eaadc/Lib/_collections_abc.py#L76-L102). Any non-None implementation currently produces non-intuitive behaviour; for example, the following passes mypy static type checking:

import typing from pandas.core import frame

def accept_hashable(obj: typing.Hashable): print(obj)

accept_hashable(frame.DataFrame())

...

Success: no issues found in 1 source file

To get behaviour identical to list and dict objects, it looks like making the change to NDFrame and Series* to __hash__ = None will do; after this change, we can get e.g. in mypy:

accept_hashable(frame.DataFrame())

...

error: Argument 1 to "accept_hashable" has incompatible type "DataFrame"; expected "Hashable" [arg-type] note: Following member(s) of "DataFrame" have conflicts: note: hash: expected "Callable[[], int]", got "None"

*Note: For some reason, Series redefines _hash_ as the same as NDFrame._hash_, even though Series inherits from NDFrame.

__hash__ = generic.NDFrame.__hash__

Expected Output

(Not applicable)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7d32926
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-44-generic
Version : #50~20.04.1-Ubuntu SMP Wed Feb 10 21:07:30 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_NZ.UTF-8
LOCALE : en_NZ.UTF-8
pandas : 1.2.2
numpy : 1.20.1
pytz : 2020.5
dateutil : 2.8.1
pip : 20.0.2
setuptools : 44.0.0
Cython : None
pytest : 6.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 8.0.0.dev
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.0
sqlalchemy : 1.3.22
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None