parsing an xml document with funky ascii characters
andrew
ayinger1 at pacbell.net
Sun Feb 3 20:14:40 EST 2002
Hi.
I am using sax parser in python 2.1.
How do I deal with xml documents with characters like 'ä'?
I have tried:
- setting encoding="ISO-8859-1 in the xml doc itself
- setting the InputSource encoding via:
source.setEncoding('ISO-8859-1')
- escaping the character in the doc: ('\x84')
- and, finally, encoding the parsed strings that have this character:
myString.encode("ISO-8859-1")
What I have found is that the default parser (appears to be expat,
retrieved from sax.make_parser) seems to store every element as
unicode strings. It appears to store them incorrectly (so, 'ä'
appears in the unicode string as '\xe4' instead of '\x84'). The
result is that if I try to encode the unicode string that i get back
from the parser, the character in question incorrectly appears as 'E'
(sum).
Any ideas? Am I doing something wrong here?
Thanks,
Andrew
More information about the Python-list
mailing list