escaping illegal characters in XML

Andrew Dalke adalke at mindspring.com
Sat Jan 11 12:49:09 EST 2003


Sandy Norton wrote:
> What's the most robust way to escape illegal characters when outputing
> XML using the python standard library (or PyXML)?
> 
> Context:
> 
> I've written a websucker that extracts links and urls from various
> news sites using "from xml.sax.saxutils import escape" to do my
> escaping and then writing out the xml file using xml.dom.minidom.
> Unfortunately, I still get illegal characters embedded in the results.

What kind of illegal characters?  'escape' will convert '&', '<' and
'>' characters.  But if you need them that likely means you are
writing the raw input text to XML.  Are you handling the character
set correctly?  That is, are you converting the output text to UTF-8
or specifing the XML character set correctly?

					Andrew
					dalke at dalkescientific.com






More information about the Python-list mailing list