cStringIO unicode weirdness

Josiah Carlson josiah.carlson at sbcglobal.net
Mon Jun 18 20:11:38 EDT 2007


Paul Rubin wrote:
>     Python 2.5 (r25:51908, Oct  6 2006, 15:24:43)
>     [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2
>     Type "help", "copyright", "credits" or "license" for more information.
>     >>> import StringIO, cStringIO
>     >>> StringIO.StringIO('a').getvalue()
>     'a'
>     >>> cStringIO.StringIO('a').getvalue()
>     'a'
>     >>> StringIO.StringIO(u'a').getvalue()
>     u'a'
>     >>> cStringIO.StringIO(u'a').getvalue()
>     'a\x00\x00\x00'
>     >>> 
> 
> I would have thought StringIO and cStringIO would return the
> same result for this ascii-encodeable string.  Worse:

You would be wrong.  The behavior of StringIO and cStringIO are 
different under certain circumstances, and those differences are 
intended.  Among them is when they are confronted with unicode, as you 
saw.  Another is when provided with an initializer...

     >>> cs = cStringIO.StringIO('a')
     >>> cs.write('b')
     Traceback (most recent call last):
       File "<stdin>", line 1, in ?
     AttributeError: 'cStringIO.StringI' object has no attribute 'write'
     >>> s = StringIO.StringIO('a')
     >>> s.write('b')

There is a summer of code project that is working towards making them 
behave the same, but the results will need to wait until Python 2.6 
and/or 3.0 .  Note that there are a few "closed, won't fix" bug reports 
regarding these exact same issues in the Python bug tracker at sourceforge.

  - Josiah



More information about the Python-list mailing list