escaping illegal characters in XML
Andrew Dalke
adalke at mindspring.com
Sat Jan 11 12:49:09 EST 2003
Sandy Norton wrote:
> What's the most robust way to escape illegal characters when outputing
> XML using the python standard library (or PyXML)?
>
> Context:
>
> I've written a websucker that extracts links and urls from various
> news sites using "from xml.sax.saxutils import escape" to do my
> escaping and then writing out the xml file using xml.dom.minidom.
> Unfortunately, I still get illegal characters embedded in the results.
What kind of illegal characters? 'escape' will convert '&', '<' and
'>' characters. But if you need them that likely means you are
writing the raw input text to XML. Are you handling the character
set correctly? That is, are you converting the output text to UTF-8
or specifing the XML character set correctly?
Andrew
dalke at dalkescientific.com
More information about the Python-list
mailing list