[XML-SIG] XML and Unicode
M.-A. Lemburg
mal@lemburg.com
Wed, 23 May 2001 09:38:14 +0200
Mark Nottingham wrote:
>
> OK, so I'm not getting something then. The attached test script (and
> data file) is the problem pared down - if u'string' is a neutral
> encoding, and .encode('utf-8') generates a utf-8 encoded string of
> that encoding, then the utf-8.html output file should display
> correctly; however, it doesn't, while the latin-1 output does
> (because the input is latin-1).
>
> It seems like the XML parser isn't converting the ISO-8859-1 to
> Unicode; does this make sense?
That's a possibility (even though I don't see any funny characters
in your example XML file); looking through the pyexpat.c code
it seems as if the parser assumes that the XML file is encoded
as UTF-8 -- at least all Unicode conversions are done using UTF-8.
--
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting: http://www.egenix.com/
Python Software: http://www.lemburg.com/python/