[XML-SIG] Re: SAX encoding and special characters

Fredrik Lundh fredrik at pythonware.com
Sat Apr 17 11:46:34 EDT 2004


"Thomas" wrote:
> I'm playing with SAX with Python-2.3.3. My goal is to parse XML files
> (I don't want to generate them).
> My XML file starts with:
> <?xml version="1.0" encoding="iso-8859-2" ?>
> I would like to get the encoding before parsing (I would like to use
> it in ContentHandler class).

just curious, but why do you need the encoding to handle the content?

> My second problem/question is about special characters in XML.
> Sometimes I have spec. chars (with char code 0-31) in XML and the
> parser ends with:

> xml.sax._exceptions.SAXParseException: spec_char.xml:68271:61:
> not well-formed (invalid token)

as the parser says, control characters are not allowed in XML files (except
for a few whitespace codes).  if you really need to parse those files, you
have
to fix them up before passing them to the parser (you can simply read them
into a python string, delete all junk characters, and then use parseString
to
parse them)

</F>






More information about the XML-SIG mailing list