[XML-SIG] Re: SAX encoding and special characters

Thomas thomasj at eworld.hu
Sun Apr 18 03:04:56 EDT 2004


Saturday, April 17, 2004, 5:46:34 PM, Fredrik wrote:

FL> "Thomas" wrote:
>> I'm playing with SAX with Python-2.3.3. My goal is to parse XML files
>> (I don't want to generate them).
>> My XML file starts with:
>> <?xml version="1.0" encoding="iso-8859-2" ?>
>> I would like to get the encoding before parsing (I would like to use
>> it in ContentHandler class).

FL> just curious, but why do you need the encoding to handle the content?
I need the encoding information, because later I need to convert
unicode back to that coding. Unfortunately I can't change the XML
format while it's not in my hands (I can't put into another element).

>> My second problem/question is about special characters in XML.
>> Sometimes I have spec. chars (with char code 0-31) in XML and the
>> parser ends with:

>> xml.sax._exceptions.SAXParseException: spec_char.xml:68271:61:
>> not well-formed (invalid token)

FL> as the parser says, control characters are not allowed in XML files (except
FL> for a few whitespace codes).  if you really need to parse those files, you
FL> have
FL> to fix them up before passing them to the parser (you can simply read them
FL> into a python string, delete all junk characters, and then use parseString
FL> to
FL> parse them)

FL> </F>
Yes, this was the 1st thing I tryed out. Unfortunately I got:
Traceback (most recent call last):
  File "./xmlparser_new.py", line 210, in ?
    saxparser.parseString(document)
AttributeError: ExpatParser instance has no attribute 'parseString'

Do you have an idea how to fix it? (yes, I underestand that it's not
supported by expat - unfortunately I don't have experience with it).

Thanks,
        Thomas

python-2.3.3, Debian Woody, libexpat1-1.95.2-6, libexpat1-dev-1.95.2-6




More information about the XML-SIG mailing list