[XML-SIG] Processing xml files with ISO 8859-1 chars

Martin v. Loewis martin@v.loewis.de
Wed, 7 Nov 2001 22:57:42 +0100


> It seems that this xml file should caused an exception, since it is
> not well-formed: the actual encoding does not match the presumed
> encoding (namely, utf-8).  The fact that the parse partially
> succeeded is disturbing.

Indeed. IMO, Expat should detect the error, but it doesn't, instead it
treats all contents >128 as proper UTF-8 (remember that all markup is
ASCII). So Expat passes it to the application (pyexpat), which invokes
the UTF-8 decoder, which fails. Due to a bug, this exception is lost,
but the entire chunk of data reported by expat isn't reported to the
Python application, either.

This is now fixed in pyexpat.c 1.42; thanks for the report.

Regards,
Martin