[ANN] pyxser-1.2r --- Python-Object to XML serialization module

Stefan Behnel stefan_ml at behnel.de
Mon Aug 24 09:00:06 EDT 2009


Daniel Molina Wegener wrote:
> unicode objects are encoded into the
> encoding that the XML document encoding has, and as you say, the whole
> XML document has one encoding. There is no mixing of byte encoded strings
> with different encodings in the outout document.

Ok, that's what I hoped anyway. It just wasn't clear from your description.


> When the object is restored, by using pyxser.unserialize:
> 
> pyobj = pyxser.unserialize(obj = xmldocstr, enc = "utf-8")

But this is XML, right? What do you need to pass the encoding for at this
point?


> Another issue is the fact that if you have mixed some encodings in byte
> strings objects in your object tree, such as iso-8859-1 and utf-8, and
> you try to serialize that object, pyxser will output to stdout the 
> serialization errors by trying to handle those mixed encodings which are
> not regarding the document encoding.

There shouldn't be any serialisation errors (unless you try to recode byte
strings on the way out, which is a no-no for arbitrary user input). All you
have to do is properly escape the byte string so that it passes the XML
encoding step.

One trick to do that is to decode the byte string as ISO-8859-1 and
serialise the result as a normal Unicode string. Then you can re-encode the
unicode string on input back to ISO-8859-1.

I choose ISO-8859-1 here because it has the well-defined side-effect of
mapping byte values directly to Unicode characters with an identical code
point value. So you do not risk any failures or data loss.

Stefan



More information about the Python-list mailing list