[Python-Dev] Hash collision security issue (now public)

Victor Stinner victor.stinner at haypocalc.com
Mon Jan 9 10:53:19 CET 2012


> That said, I don't think smallest-format is actually enforced with
> anything stronger than comments (such as in unicodeobject.h struct
> PyASCIIObject) and asserts (mostly calling
> _PyUnicode_CheckConsistency).  I don't have any insight on how
> prevalent non-conforming strings will be in practice, or whether
> supporting their equality will be required as a bugfix.

If you are only Python, you cannot create a string in a non canonical form.

If you use the C API, you can create a string in a non canonical form
using PyUnicode_New() + PyUnicode_WRITE, or
PyUnicode_FromUnicode(NULL, length) (or
PyUnicode_FromStringAndSize(NULL, length)) + direct access to the
Py_UNICODE* string. If you create strings in a non canonical form, it
is a bug in your application and Python doesn't help you. But how
could Python help you? Expose a function to check your newly creating
string? There is already _PyUnicode_CheckConsistency() which is slow
(O(n)) because it checks each character, it is only used in debug
mode.

Victor


More information about the Python-Dev mailing list