[XML-SIG] Entity managment question --

Dennis Allison allison@sumeru.stanford.EDU
Sat, 4 May 2002 20:35:30 -0700 (PDT)


I'm using the minidom for part of a project and the Zope ParsedXML for the
rest.  This relates to my experience with the minidom.   PyXML 0.7.1, I
think.

So, I have a bunch of XML some of which uses HTML entities in its text,
both in the DATA sections and in the attributes.

The minidom parser conversts these entities to Unicode as it's supposed
to.

Processing ensues.  And then the XML or portions of the information it
contains needs to be written out.

The problem is recapturing the HTML-ish entities that have been converted
to unicode.  Does such a beast exist?  And where can it be found?

Alternatively, I could try to subvert the conversions in the first place.  
I looked at the minidom code in some detail, but I've been unable to grok
where to apply pixie dust to have it not convert HTML-ish entities to 
unicode in the firstplace.

So, help.

I don't normally read this list so a direct response would be appreciated.

BTW, one nice approach would be to extend the current unicode codec
translation mechanisms to have a HTML-encoding built-in.  Were I not on a
deadline with little time to play, I'd give that one a try.

Thanks for any help.

-dra