Convert xml symbol notation

"Martin v. Löwis" martin at v.loewis.de
Sat Apr 7 04:52:03 EDT 2007


> I'm working on a script to download and parse a web page, and it
> includes xml symbol notation, such as ' for the ' character.  Does
> anyone know of a pre-existing python script/lib to convert the xml
> notation back to the actual symbol it represents?

If you have this given in an XML file (rather than an HTML file which
is not well-formed XML), you could use an XML parser for the entire
file. This would automatically unescape character references. Likewise,
you can parse it with HTMLParser, which will invoke the handle_charref
method for these.

If you just want to unescape references, you can use the code in

http://effbot.org/zone/re-sub.htm

HTH,
Martin



More information about the Python-list mailing list