[XML-SIG] Handling of character entity references

Randall Nortman randall@wonderclown.com
Mon, 26 May 2003 10:51:08 -0500


On Mon, May 26, 2003 at 09:46:14AM -0400, Mark E. wrote:
> > So essentially what I'm asking is how do I get PyXML to
> preserve
> > "é" as-is and output it in the same manner when I
> PrettyPrint() it?
> > (Or, equivalently, convert it to its Unicode representation on
> input and
> > back to an entity reference on output.)
> >
> It can be done with the expat parser. Below is an example:
[...]

This is a good suggestion, but unfortunately my code uses DOM
extensively, and your example is only useful for an application based
on SAX events. xml.dom.ext.reader.PyExpat creates its own parser, so
that's no help. I tried creating a parser as in your example, but for
the content handler I inherited from xml.dom.ext.Sax2.XmlDomGenerator
instead. I then passed my parser as the parser argument to
xml.dom.ext.reader.Sax2.Reader.__init__() and MyContentHandler as the
saxHandlerClass.

Everything parsed fine, but "é" was translated to nothing (no
character in that spot at all) on output. I suspect the skippedEntity
method needs to actually *do* something other than just print that it
is skipping an entity (which it does, so I know my version is being
used), but I am completely ignorant about how to XmlDomGenerator works
internally, and I've never really written SAX-based parsers anyway.
Can you give me a hint?

Thanks for your help,

Randall Nortman