[Python-Dev] Hash collision security issue (now public) (original) (raw)

Victor Stinner victor.stinner at haypocalc.com
Mon Jan 9 10:53:19 CET 2012


That said, I don't think smallest-format is actually enforced with anything stronger than comments (such as in unicodeobject.h struct PyASCIIObject) and asserts (mostly calling PyUnicodeCheckConsistency).  I don't have any insight on how prevalent non-conforming strings will be in practice, or whether supporting their equality will be required as a bugfix.

If you are only Python, you cannot create a string in a non canonical form.

If you use the C API, you can create a string in a non canonical form using PyUnicode_New() + PyUnicode_WRITE, or PyUnicode_FromUnicode(NULL, length) (or PyUnicode_FromStringAndSize(NULL, length)) + direct access to the Py_UNICODE* string. If you create strings in a non canonical form, it is a bug in your application and Python doesn't help you. But how could Python help you? Expose a function to check your newly creating string? There is already _PyUnicode_CheckConsistency() which is slow (O(n)) because it checks each character, it is only used in debug mode.

Victor



More information about the Python-Dev mailing list