Issue 26126: Possible subtle bug when normalizing and str.translate()ing (original) (raw)

I am using Python 3.4.3 on Xubuntu 14.04 LTS 64-bit.

I have a program that when run repeatedly sometimes what I expect, and sometimes does not:

$ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py BUG ('The aenid oevre', '!=', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre') $ ~/tmp/normbug.py BUG ('The aenid oevre', '!=', 'The AEnid oevre') $ ~/tmp/normbug.py BUG ('The aenid oevre', '!=', 'The AEnid oevre') $ ~/tmp/normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre')

As you can see, sometimes the left (actual) is case-folded, and sometimes it isn't which is surprising given the code (which is attached).

Of course this could be a mistake on my part; maybe I've misunderstood how the unicode normalizing works.

There seems to be a connection to hash randomization. I consistently get

$ PYTHONHASHSEED=1 python3.6 ./normbug.py BUG ('The aenid oevre', '!=', 'The AEnid oevre') $ PYTHONHASHSEED=0 python3.6 ./normbug.py OK ('The AEnid oevre', '==', 'The AEnid oevre')