Some questions about decode/encode

Sun Jan 27 15:47:04 EST 2008

>"John Machin" <sjmachin at lexicon.net> wrote in message 
>news:eeb3a05f-c122-4b8c-95d8-d13741263374 at h11g2000prf.googlegroups.com...
>On Jan 27, 9:17 pm, glacier <rong.x... at gmail.com> wrote:
>> On 1月24日, 下午3时29分, "Gabriel Genellina" <gagsl-... at yahoo.com.ar> 
>> wrote:
>
>*IF* the file is well-formed GBK, then the codec will not mess up when
>decoding it to Unicode. The usual cause of mess is a combination of a
>human and a text editor :-)

SAX uses the expat parser.  From the pyexpat module docs:

Expat doesn't support as many encodings as Python does, and its repertoire 
of encodings can't be extended; it supports UTF-8, UTF-16, ISO-8859-1 
(Latin1), and ASCII. If encoding is given it will override the implicit or 
explicit encoding of the document.

--Mark