[issue8941] utf-32be codec failing on UCS-2 python build for 32-bit value

Antoine Pitrou report at bugs.python.org
Wed Jun 9 13:31:20 CEST 2010


Antoine Pitrou <pitrou at free.fr> added the comment:

The following code at the beginning of PyUnicode_DecodeUTF32Stateful is buggy when codec endianness doesn't match the native endianness (not to mention it could also crash if the underlying CPU arch doesn't support unaligned access to 4-byte integers):

#ifndef Py_UNICODE_WIDE
    for (i = pairs = 0; i < size/4; i++)
        if (((Py_UCS4 *)s)[i] >= 0x10000)
            pairs++;
#endif

As a result, the preallocated unicode object isn't long enough and Python writes into memory it shouldn't write into. It can produce hard crashes, such as:

>>> l = unicode(b'\x00\x01\x00\x00' * 1024, 'utf-32be')
Debug memory block at address p=0xf2b310:
    2050 bytes originally requested
    The 8 pad bytes at p-8 are FORBIDDENBYTE, as expected.
    The 8 pad bytes at tail=0xf2bb12 are not all FORBIDDENBYTE (0xfb):
        at tail+0: 0x00 *** OUCH
        at tail+1: 0xdc *** OUCH
        at tail+2: 0x00 *** OUCH
        at tail+3: 0xd8 *** OUCH
        at tail+4: 0x00 *** OUCH
        at tail+5: 0xdc *** OUCH
        at tail+6: 0x00 *** OUCH
        at tail+7: 0xd8 *** OUCH
    The block was made by call #61925422603698392 to debug malloc/realloc.
    Data at p: 00 d8 00 dc 00 d8 00 dc ... 00 dc 00 d8 00 dc 00 d8
Fatal Python error: bad trailing pad byte
Abandon

----------
priority: high -> critical
type: behavior -> crash

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8941>
_______________________________________


More information about the Python-bugs-list mailing list