[XML-SIG] Re: PyXML and iso-latin-1

Andrew M. Kuchling akuchlin@mems-exchange.org
Fri, 10 Mar 2000 14:17:43 -0500 (EST)


Naulleau, Elie writes:
>Whenever I load this data in the DOM tree, it reappears with messy accents
>and diacritics,
>although the header is declared as followed :
><?xml version='1.0' encoding='iso-8859-1'?>

If you wind up using the PyExpat parser, through SAX or directly,
it'll always produce UTF-8 output, which needs to be translated to
Latin1.  The current CVS tree includes a wide-string module to handle
this; for Python 1.6, it'll use the built-in Unicode support (once
it's finished).  For now, do something like this (untested):

import wstring
s = '... UTF-8 output from Expat ...'

# Build a Unicode string
unicode_str = wstring.from_utf8(s)

# Convert to Latin1
latin1 = unicode_str.encode('LATIN1')

... do something with the output string ...

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
    "I remember the Hieromancer. I met him, when I was here before. He was a
sweet old guy. Kind of like my grandfather. What happened to him?"
    "He's dead. I expect that he's dead. If he's *lucky* he's dead."
    -- Barbie and Wilkinson, in SANDMAN #35: "Beginning to See the Light"