[XML-SIG] XML and Unicode

M.-A. Lemburg mal@lemburg.com
Wed, 23 May 2001 09:38:14 +0200


Mark Nottingham wrote:
> 
> OK, so I'm not getting something then. The attached test script (and
> data file) is the problem pared down - if u'string' is a neutral
> encoding, and .encode('utf-8') generates a utf-8 encoded string of
> that encoding, then the utf-8.html output file should display
> correctly; however, it doesn't, while the latin-1 output does
> (because the input is latin-1).
> 
> It seems like the XML parser isn't converting the ISO-8859-1 to
> Unicode; does this make sense?

That's a possibility (even though I don't see any funny characters
in your example XML file); looking through the pyexpat.c code
it seems as if the parser assumes that the XML file is encoded 
as UTF-8 -- at least all Unicode conversions are done using UTF-8.

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
______________________________________________________________________
Company & Consulting:                           http://www.egenix.com/
Python Software:                        http://www.lemburg.com/python/