Convert xml symbol notation
"Martin v. Löwis"
martin at v.loewis.de
Sat Apr 7 04:52:03 EDT 2007
> I'm working on a script to download and parse a web page, and it
> includes xml symbol notation, such as ' for the ' character. Does
> anyone know of a pre-existing python script/lib to convert the xml
> notation back to the actual symbol it represents?
If you have this given in an XML file (rather than an HTML file which
is not well-formed XML), you could use an XML parser for the entire
file. This would automatically unescape character references. Likewise,
you can parse it with HTMLParser, which will invoke the handle_charref
method for these.
If you just want to unescape references, you can use the code in
http://effbot.org/zone/re-sub.htm
HTH,
Martin
More information about the Python-list
mailing list