Issue 26126: Possible subtle bug when normalizing and str.translate()ing (original) (raw)
I am using Python 3.4.3 on Xubuntu 14.04 LTS 64-bit.
I have a program that when run repeatedly sometimes what I expect, and sometimes does not:
$ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py BUG ('The aenid oevre', '!=', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py BUG ('The aenid oevre', '!=', 'The AEnid oevre') $ ~/tmp/normbug.py BUG ('The aenid oevre', '!=', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre')
As you can see, sometimes the left (actual) is case-folded, and sometimes it isn't which is surprising given the code (which is attached).
Of course this could be a mistake on my part; maybe I've misunderstood how the unicode normalizing works.
There seems to be a connection to hash randomization. I consistently get
$ PYTHONHASHSEED=1 python3.6 ./normbug.py BUG ('The aenid oevre', '!=', 'The AEnid oevre') $ PYTHONHASHSEED=0 python3.6 ./normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre')