BUG: hashing's are the same for different key values for hash_pandas_object · Issue #41404 · pandas-dev/pandas (original) (raw)


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

Your code here

hash_pandas_object(test[ columns_names[i] ], index=True, encoding='utf8', hash_key='012' , categorize=True) 0 3713087409444908179 1 7478705303072568462 2 12024724921319894105 3 12785939622558835299 4 9788992550609991128 5 1239052552041868816 6 9610202078597672705 7 12287384021013641209 8 10264240190786022141 9 10535148974563425818 10 10238940258630658604 11 15446383648481672096 12 14265484681526586699 13 8862960024351814462 dtype: uint64

hash_pandas_object(test[ columns_names[i] ], index=True, encoding='utf8', hash_key='01298768755' , categorize=True) 0 3713087409444908179 1 7478705303072568462 2 12024724921319894105 3 12785939622558835299 4 9788992550609991128 5 1239052552041868816 6 9610202078597672705 7 12287384021013641209 8 10264240190786022141 9 10535148974563425818 10 10238940258630658604 11 15446383648481672096 12 14265484681526586699 13 8862960024351814462 dtype: uint64

hash_pandas_object(test[ [ columns_names[i] , columns_names[j] ] ], index=True, encoding='utf8', hash_key='01' , categorize=True) 0 11107058607426530111 1 15666232225746534312 2 1136675766145783381 3 14892489092684772659 4 8519430825150424018 5 550646855301521146 6 3846031041217881485 7 2936614219041217571 8 16182698869780262111 9 2895548739675332954 10 677258434224654732 11 6105852029672525672 12 15095703462911844621 13 6081994522921680694 dtype: uint64 hash_pandas_object(test[ [ columns_names[i] , columns_names[j] ] ], index=True, encoding='utf8', hash_key='0198076674534' , categorize=True) 0 11107058607426530111 1 15666232225746534312 2 1136675766145783381 3 14892489092684772659 4 8519430825150424018 5 550646855301521146 6 3846031041217881485 7 2936614219041217571 8 16182698869780262111 9 2895548739675332954 10 677258434224654732 11 6105852029672525672 12 15095703462911844621 13 6081994522921680694 dtype: uint64

test[ [ columns_names[i] , columns_names[j] ] ] A B 0 0 -1 1 0 -1 2 0 0 3 0 0 4 1 0 5 1 0 6 2 0 7 2 2 8 2 2 9 2 2 10 2 2 11 2 2 12 -1 2 13 -1 2

hashing's are the same for different key values

Problem description

it should be different hashing values for different keys

Expected Output

Output of pd.show_versions()

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

print("sklearn.version = ", sklearn.version)
sklearn.version = 0.24.2

pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 9, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.2.4
numpy : 1.19.5
pytz : 2021.1
dateutil : 2.8.1
pip : 20.3.3
setuptools : 51.3.3
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None