[ python-Bugs-1548891 ] shlex (or perhaps cStringIO) and unicode strings

Tue Aug 29 23:16:22 CEST 2006

Bugs item #1548891, was opened at 2006-08-29 21:16
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1548891&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: Erwin S. Andreasen (drylock)
Assigned to: Nobody/Anonymous (nobody)
Summary: shlex (or perhaps cStringIO) and unicode strings

Initial Comment:
Python 2.5c1 (r25c1:51305, Aug 19 2006, 18:23:29) 
[GCC 4.1.2 20060814 (prerelease) (Debian 4.1.1-11)] on
linux2

(Also seen in 2.4)

shlex.split do not like unicode strings:

>>> shlex.split(u"foo")
['f\x00\x00\x00o\x00\x00\x00o\x00\x00\x00']

The shlex code IMO suggests that it should accept
unicode (as it checks for argument being an instance of
basestring).

Digging slightly into this, this seems to be a
difference between StringIO and cStringIO. While
cStringIO claims it accepts unicode as long as it
encode too ASCII it gives invalid results:

>>> sys.getdefaultencoding()
'ascii'

>>> cStringIO.StringIO(u'foo').getvalue()
'f\x00\x00\x00o\x00\x00\x00o\x00\x00\x00'

Perhaps cStringIO should .encode to ASCII encoding
before consuming the input, as I can't imagine anyone
cares about the above result (which I guess are the
UCS-2 or UCS-4 characters).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1548891&group_id=5470