[XML-SIG] UTF-8 and ISO-8859-1 problems again

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 10 Jan 2001 08:49:56 +0100


> If this is a bug, I will post it, but I'm not sure it is yet.
> Attached are two files, one a test xml with encoding ISO-8859-1 and
> the other a test python script.  The problem is that if one uses a
> pyexpat parser, and then renders in ISO-8859-1 then things are ok.
> If one uses the drv_xmllib driver, then an error occurs as it tries
> to translate back to ISO-8859-1.  My guess is that the ISO-8859-1
> transformation into UTF-8 for character data(which is what happens
> when the original document is parsed) is not being done properly in
> the drv_xmllib driver.

That's a good guess. drv_xmllib does not implement handle_xml at all,
so it does not know what the encoding is. However, what it *should*
do, atleast in Python 2.0, is to produce Unicode objects, not UTF-8
encoded strings.

Would you like to look into correcting that?

> My only reason for using drv_xmllib is that pyexpat still has a
> memory leak in it.

Not that I know of, atleast not in PyXML 0.6.3.

> I was using PyXML-1.2, but just tried PyXML-1.3 and the errors still occur.

I'm confused. Where did you get PyXML 1.2 from?

Regards,
Martin