nonstandard XML character entities?

Chuck Rhode CRhode at LacusVeris.com
Sat Apr 14 10:04:45 EDT 2007


Martin v. Löwis wrote this on Sat, 14 Apr 2007 09:10:44 +0200.  My
reply is below.

> Paul Rubin:

>> I'm new to xml mongering so forgive me if there's an obvious
>> well-known answer to this.  It's not real obvious from the library
>> documentation I've looked at so far.  Basically I have to munch of
>> a bunch of xml files which contain character entities like ú
>> which are apparently nonstandard.

-snip-

> In ElementTree, the XMLTreeBuilder has an attribute entity which is
> a dictionary used to map entity names in entity references to their
> definitions. Whether you can make the parser download the DTD
> itself, I don't know.

What he said....

Try this on your piano:

: import xml.etree.ElementTree  # or elementtree.ElementTree prior to 2.5
: ElementTree = xml.etree.ElementTree
: import htmlentitydefs


: class XmlFile(ElementTree.ElementTree):
                                                                                                                              
:     def __init__(self, file=None, tag='global', **extra):
:         ElementTree.ElementTree.__init__(self)
:         parser = ElementTree.XMLTreeBuilder(
:             target=ElementTree.TreeBuilder(Element))
:         parser.entity = htmlentitydefs.entitydefs
:         self.parse(source=file, parser=parser)
:         return


It looks goofy as can be, but it works for me.

-- 
.. Chuck Rhode, Sheboygan, WI, USA
.. Weather:  http://LacusVeris.com/WX
.. 32° — Wind Calm



More information about the Python-list mailing list