[XML-SIG] Determining output encoding of a SAX parser

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Wed, 25 Oct 2000 08:25:43 +0200


> Is there any way to determine the encoding of the output from a SAX1
> parser driver?  It's clear if the callbacks are being passed Unicode
> strings, but with 8-bit strings you have no way of knowing if they're
> in Latin1 or UTF-8 or anything (unless you know what parser you're
> using).  =

> =

> Given that SAX2 does seem to support this with
> XMLReader.{get,set}Encoding(), is this worth fixing in SAX1?  =


I don't think it is worth to fix anything with SAX1, unless documented
functionality is clearly broken.

=46rom Python 1.6 on, I'd expect drivers to produce Unicode objects in
most cases (although only expat currently does), in which case the
encoding of the input would be irrelevant. Please note that the
{get,set}Encoding() is on the InputSource, not on the XMLReader. I
don't know whether the reader is supposed to invoke setEncoding on the
source once it sees an encoding=3D attribute.

Regards,
Martin