[Python-Dev] Hash collision security issue (now public) (original) (raw)

Jim Jewett jimjjewett at gmail.com
Sun Jan 8 23:33:32 CET 2012


In http://mail.python.org/pipermail/python-dev/2012-January/115368.html Stefan Behnel wrote:

Admittedly, this may require some adaptation for the PEP393 unicode memory layout in order to produce identical hashes for all three representations if they represent the same content.

They SHOULD NOT represent the same content; comparing two strings currently requires converting them to canonical form, which means the smallest format (of those three) that works.

If it can be represented in PyUnicode_1BYTE_KIND, then representations using PyUnicode_2BYTE_KIND or PyUnicode_4BYTE_KIND don't count as canonical, won't be created by Python itself, and already compare unequal according to both PyUnicode_RichCompare and stringlib/eq.h (a shortcut used by dicts).

That said, I don't think smallest-format is actually enforced with anything stronger than comments (such as in unicodeobject.h struct PyASCIIObject) and asserts (mostly calling _PyUnicode_CheckConsistency). I don't have any insight on how prevalent non-conforming strings will be in practice, or whether supporting their equality will be required as a bugfix.

-jJ



More information about the Python-Dev mailing list