[XML-SIG] Problem with entities

Thomas B. Passin tpassin@comcast.net
Thu, 07 Mar 2002 08:56:25 -0500


[Patrick Gaherty]


> I'm using the XMLGenerator class from saxutils (PyXML 0.7) and I'm having
> problems with entities. To start off with I'm just trying to recreate the
> original XML input file. ie
>
> class Generator(saxutils.XMLGenerator):
>
>          def startElement(self,name,attrs):
>                  saxutils.XMLGenerator.startElement(self,name,attrs)
>
>          def endElement(self,name):
>                  saxutils.XMLGenerator.endElement(self,name)
>
>          def characters(self,content):
>                  saxutils.XMLGenerator.characters(self,content)
>
>
> fout1 = re.sub(r'\.', '_tmp.', sys.argv[1])
> output = open(fout1, 'w')
> p = make_parser()
> p.setContentHandler(Generator(output))
> p.parse(sys.argv[1])
>
> This works a treat, except I'm losing any entities I've declared in my DTD
> (ie œ). Ideally I'd like them to appear in the output untouched. Any
> help to point me in the right direction would be greatly appreciated.
>

This cannot be done with standard xml processing, because by the xml Rec,
the parser has to expand the entities into the strings they represent.  Then
they are gone, and no memory of the original entities remains.  And for any
standard xml processing purpose, you don't need them, although they may make
the output more readable for you.

If the parser provides a startEntity callback, you could capture them in
your handler, and restore them later, perhaps.  Someone else on the list
could tell you about doing this with Pyxml, Or you could do some
postprocessing to restore them.

Cheers,

Tom P