Any reason why cStringIO in 2.5 behaves different from 2.4?

Stefan Behnel stefan.behnel-n05pAM at web.de
Thu Jul 26 13:56:44 EDT 2007


Stefan Scholl wrote:
> Stefan Behnel <stefan.behnel-n05pAM at web.de> wrote:
>> Stefan Scholl wrote:
>>> Stefan Behnel <stefan.behnel-n05pAM at web.de> wrote:
>>>> Stefan Scholl wrote:
>>>>> Stefan Behnel <stefan.behnel-n05pAM at web.de> wrote:
>>>>>> Stefan Scholl wrote:
>>>>>>> Well, http://docs.python.org/lib/module-xml.sax.html is missing
>>>>>>> the fact, that I can't use Unicode with parseString().
>>>>>>>
>>>>>>> This parseString() uses cStringIO.
>>>>>> Well, Python unicode is not a valid *byte* encoding for XML.
>>>>>>
>>>>>> lxml.etree can parse unicode, if you really want, but otherwise, you should
>>>>>> maybe stick to well-formed XML.
>>>>> The XML is well-formed. Works perfect in Python 2.4 with Python
>>>>> unicode and Python sax parser.
>>>> The XML is *not* well-formed if you pass Python unicode instead of a byte
>>>> encoded string. Read the XML spec.
>>>>
>>>> It would be well-formed if you added the proper XML declaration, but that is
>>>> system specific (UCS-4 or UTF-16, BE or LE). So don't even try.
>>> Who cares? I'm not calling any external tools.
>> XML cares. If you want to work with something that is not XML, do not expect
>> XML tools to help you do it. XML tools work with XML, and there is a spec that
>> says what XML is. Your string is not XML.
> 
> This isn't some sophisticated XML tool that tells me the string
> is wrong. It's a changed behavior of cStringIO that throws an
> exception. While I'm just using the method parseString() of
> xml.sax.

All I'm saying is that parseString() is perfectly right in using cStringIO, as
cStringIO supports every possible incarnation of serialised XML.

It was documented that cStringIO does not support Unicode and it doesn't:

  $ python2.4
  Python 2.4.4 (#2, Apr 12 2007, 21:03:11)
  [GCC 4.1.2 (Ubuntu 4.1.2-0ubuntu4)] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> from cStringIO import StringIO
  >>> s = StringIO()
  >>> s.write(u"\uf852")
  Traceback (most recent call last):
    File "<stdin>", line 1, in ?
  UnicodeEncodeError: 'ascii' codec can't encode character u'\uf852' in
position 0: ordinal not in range(128)

What a surprise.

Stefan



More information about the Python-list mailing list