Any reason why cStringIO in 2.5 behaves different from 2.4?

Chris Mellon arkanes at gmail.com
Thu Jul 26 17:07:40 EDT 2007


On 7/26/07, Stefan Scholl <stesch at no-spoon.de> wrote:
> Stefan Behnel <stefan.behnel-n05pAM at web.de> wrote:
> > The XML is *not* well-formed if you pass Python unicode instead of a byte
> > encoded string. Read the XML spec.
>
> Pointers, please.
>
> Last time I read that part of the spec was when a customer's
> consulting company switched to ISO-8859-15 without saying
> something beforehand. The old code (PHP) I have to maintain
> couldn't deal with it.
>
> It was wrong to switch encoding without telling somebody about
> it. And a XML processor isn't required to support ISO-8859-15.
> But I thought it was too embarrassing not to support this
> encoding. I fixed that part without making a fuss.
>
>
> A Python XML processor that can't handle the own encoding is
> embarrassing. It isn't required to support it. It would be OK if
> it wouldn't support UTF-7. But a parseString() method that
> doesn't want Python strings? No way!
>

Of course it can handle its own encoding. But you're passing incorrect
values to it, the same way that passing '10' to a function expecting
an int is going to fail.

cStringIO in python 2.4 is buggy - when passed a unicode object, it
silently uses the (platform and compilation dependent) internal buffer
of the unicode object. In 2.5 this was corrected to be consistent with
all other unicode/str conversions and encode it using the default
encoding, failing when that's not possible (as in your example).

It's not that your code worked on 2.4, and 2.5 broke it - the 2.4 code
was subtly buggy and 2.5 is preventing you from having that bug.

XML is not a string. It's a specific type of bytestream. If you want
to work with XML, then generate well-formed XML in the correct
encoding. There's no reason you should have an XML document (as
opposed to values extracted from that document) in unicode objects at
all.



More information about the Python-list mailing list