[XML-SIG] Re: PyXML and iso-latin-1
Andrew M. Kuchling
akuchlin@mems-exchange.org
Fri, 10 Mar 2000 14:17:43 -0500 (EST)
Naulleau, Elie writes:
>Whenever I load this data in the DOM tree, it reappears with messy accents
>and diacritics,
>although the header is declared as followed :
><?xml version='1.0' encoding='iso-8859-1'?>
If you wind up using the PyExpat parser, through SAX or directly,
it'll always produce UTF-8 output, which needs to be translated to
Latin1. The current CVS tree includes a wide-string module to handle
this; for Python 1.6, it'll use the built-in Unicode support (once
it's finished). For now, do something like this (untested):
import wstring
s = '... UTF-8 output from Expat ...'
# Build a Unicode string
unicode_str = wstring.from_utf8(s)
# Convert to Latin1
latin1 = unicode_str.encode('LATIN1')
... do something with the output string ...
--
A.M. Kuchling http://starship.python.net/crew/amk/
"I remember the Hieromancer. I met him, when I was here before. He was a
sweet old guy. Kind of like my grandfather. What happened to him?"
"He's dead. I expect that he's dead. If he's *lucky* he's dead."
-- Barbie and Wilkinson, in SANDMAN #35: "Beginning to See the Light"