[XML-SIG] saxutils.XMLGenerator: Output encoding

Walter Dörwald walter@livinglogic.de
Wed, 25 Sep 2002 19:49:32 +0200


Carsten Oberscheid wrote:

> Hello everybody,
> 
> I have not followed this list for some time, so this may have been
> discussed before: to the XMLGenerator, an output encoding can be
> given. All output is then written through saxutils.escape() using this
> encoding. As a result, any character in the document that can not be
> represented in the output encoding raises a UnicodeException. So one
> single special character in a file can force me to produce UTF-8
> encoding, although for further processing ISO 8859-1 or even ASCII
> would be much more handy.
> 
> An alternative would be to catch the UnicodeException and, as a
> reaction, encode the offensive characters as character references
> (e.g. "“"). Shouldn't this be the XML way to do it?

That's exactly the purpose of PEP 293, which will go into Python 2.3.
With it you can write:
    u"x\u201cx".encode("ascii", "xmlcharrefreplace")
and you'll get:
    "x“x"

> I can provide a very primitive patch for saxutils.py, if anybody is
> interested. I even would try to make it less primitive, if there are
> no objections against taking this fix into the distribution :^)

Using the new functionality in PyXML is another matter, because of
backwards compatibility. If you'd like to provide a patch for PyXML
that work for versions prior to 2.3, go ahead.

Bye,
    Walter Dörwald