Any reason why cStringIO in 2.5 behaves different from 2.4?

Thu Jul 26 17:07:40 EDT 2007

On 7/26/07, Stefan Scholl <stesch at no-spoon.de> wrote:
> Stefan Behnel <stefan.behnel-n05pAM at web.de> wrote:
> > The XML is *not* well-formed if you pass Python unicode instead of a byte
> > encoded string. Read the XML spec.
>
> Pointers, please.
>
> Last time I read that part of the spec was when a customer's
> consulting company switched to ISO-8859-15 without saying
> something beforehand. The old code (PHP) I have to maintain
> couldn't deal with it.
>
> It was wrong to switch encoding without telling somebody about
> it. And a XML processor isn't required to support ISO-8859-15.
> But I thought it was too embarrassing not to support this
> encoding. I fixed that part without making a fuss.
>
>
> A Python XML processor that can't handle the own encoding is
> embarrassing. It isn't required to support it. It would be OK if
> it wouldn't support UTF-7. But a parseString() method that
> doesn't want Python strings? No way!
>

Of course it can handle its own encoding. But you're passing incorrect
values to it, the same way that passing '10' to a function expecting
an int is going to fail.

cStringIO in python 2.4 is buggy - when passed a unicode object, it
silently uses the (platform and compilation dependent) internal buffer
of the unicode object. In 2.5 this was corrected to be consistent with
all other unicode/str conversions and encode it using the default
encoding, failing when that's not possible (as in your example).

It's not that your code worked on 2.4, and 2.5 broke it - the 2.4 code
was subtly buggy and 2.5 is preventing you from having that bug.

XML is not a string. It's a specific type of bytestream. If you want
to work with XML, then generate well-formed XML in the correct
encoding. There's no reason you should have an XML document (as
opposed to values extracted from that document) in unicode objects at
all.