cStringIO unicode weirdness

Paul Rubin http
Mon Jun 18 18:56:03 EDT 2007


    Python 2.5 (r25:51908, Oct  6 2006, 15:24:43)
    [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import StringIO, cStringIO
    >>> StringIO.StringIO('a').getvalue()
    'a'
    >>> cStringIO.StringIO('a').getvalue()
    'a'
    >>> StringIO.StringIO(u'a').getvalue()
    u'a'
    >>> cStringIO.StringIO(u'a').getvalue()
    'a\x00\x00\x00'
    >>> 

I would have thought StringIO and cStringIO would return the
same result for this ascii-encodeable string.  Worse:

    >>> StringIO.StringIO(u'a').getvalue().encode('utf-8').decode('utf-8')
    u'a'

does the right thing, but

    >>> cStringIO.StringIO(u'a').getvalue().encode('utf-8').decode('utf-8')
    u'a\x00\x00\x00'

looks bogus.  Am I misunderstanding something?



More information about the Python-list mailing list