[issue6988] shlex.split() converts unicode input to UCS-4 output with varying byte order

Thu Sep 24 18:12:17 CEST 2009

Amaury Forgeot d'Arc <amauryfa at gmail.com> added the comment:

I'll take the opposite point of view:
the bad behavior was introduced with 2.5.1 (issue1548891, r52302), and
reverted for 2.5.2 because "it broke backwards compatibility with
arbitrary read buffers" (issue1730114, r53831)

The difference is in cStringIO:

>>> from cStringIO import StringIO
>>> StringIO(u"Hello, World!").read()
'H\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00,\x00\x00\x00
\x00\x00\x00W\x00\x00\x00o\x00\x00\x00r\x00\x00\x00l\x00\x00\x00d\x00\x00\x00!\x00\x00\x00'

The byte order is not different in the two strings: but u" " becomes 
" \x00\x00\x00" and the three zeros are copied into the second item.

----------
nosy: +amaury.forgeotdarc
resolution:  -> wont fix
status: open -> pending

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6988>
_______________________________________