[issue8941] utf-32be codec failing on UCS-2 python build for 32-bit value
Antoine Pitrou
report at bugs.python.org
Wed Jun 9 13:31:20 CEST 2010
Antoine Pitrou <pitrou at free.fr> added the comment:
The following code at the beginning of PyUnicode_DecodeUTF32Stateful is buggy when codec endianness doesn't match the native endianness (not to mention it could also crash if the underlying CPU arch doesn't support unaligned access to 4-byte integers):
#ifndef Py_UNICODE_WIDE
for (i = pairs = 0; i < size/4; i++)
if (((Py_UCS4 *)s)[i] >= 0x10000)
pairs++;
#endif
As a result, the preallocated unicode object isn't long enough and Python writes into memory it shouldn't write into. It can produce hard crashes, such as:
>>> l = unicode(b'\x00\x01\x00\x00' * 1024, 'utf-32be')
Debug memory block at address p=0xf2b310:
2050 bytes originally requested
The 8 pad bytes at p-8 are FORBIDDENBYTE, as expected.
The 8 pad bytes at tail=0xf2bb12 are not all FORBIDDENBYTE (0xfb):
at tail+0: 0x00 *** OUCH
at tail+1: 0xdc *** OUCH
at tail+2: 0x00 *** OUCH
at tail+3: 0xd8 *** OUCH
at tail+4: 0x00 *** OUCH
at tail+5: 0xdc *** OUCH
at tail+6: 0x00 *** OUCH
at tail+7: 0xd8 *** OUCH
The block was made by call #61925422603698392 to debug malloc/realloc.
Data at p: 00 d8 00 dc 00 d8 00 dc ... 00 dc 00 d8 00 dc 00 d8
Fatal Python error: bad trailing pad byte
Abandon
----------
priority: high -> critical
type: behavior -> crash
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue8941>
_______________________________________
More information about the Python-bugs-list
mailing list