cStringIO unicode weirdness
Paul Rubin
http
Mon Jun 18 18:56:03 EDT 2007
Python 2.5 (r25:51908, Oct 6 2006, 15:24:43)
[GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import StringIO, cStringIO
>>> StringIO.StringIO('a').getvalue()
'a'
>>> cStringIO.StringIO('a').getvalue()
'a'
>>> StringIO.StringIO(u'a').getvalue()
u'a'
>>> cStringIO.StringIO(u'a').getvalue()
'a\x00\x00\x00'
>>>
I would have thought StringIO and cStringIO would return the
same result for this ascii-encodeable string. Worse:
>>> StringIO.StringIO(u'a').getvalue().encode('utf-8').decode('utf-8')
u'a'
does the right thing, but
>>> cStringIO.StringIO(u'a').getvalue().encode('utf-8').decode('utf-8')
u'a\x00\x00\x00'
looks bogus. Am I misunderstanding something?
More information about the Python-list
mailing list