[Python-Dev] Hash collision security issue (now public) (original) (raw)

Stefan Behnel stefan_ml at behnel.de
Mon Jan 9 09:13:15 CET 2012


Jim Jewett, 08.01.2012 23:33:

Stefan Behnel wrote:

Admittedly, this may require some adaptation for the PEP393 unicode memory layout in order to produce identical hashes for all three representations if they represent the same content. They SHOULD NOT represent the same content; comparing two strings currently requires converting them to canonical form, which means the smallest format (of those three) that works. [...] That said, I don't think smallest-format is actually enforced with anything stronger than comments (such as in unicodeobject.h struct PyASCIIObject) and asserts (mostly calling PyUnicodeCheckConsistency).

That's what I meant. AFAIR, the PEP393 discussions at some point brought up the suspicion that third party code may end up generating Unicode strings that do not comply with that "invariant". So internal code shouldn't strictly rely on it when it deals with user provided data. One example is the "unequal kinds" optimisation in equality comparison, which, if I'm not mistaken, wasn't implemented, due to exactly this reasoning. The same applies to hashing then.

Stefan



More information about the Python-Dev mailing list