[XML-SIG] Parsing the XML file which has encoding 'gb2312' .

Mike Brown mike at skew.org
Sat Dec 13 08:14:13 EST 2003


Xinzhi Zhao wrote:
> Hi,
> My XML files have to use other encoding instead of the default one, i.e. 
> 'gb2312'. When I was parsing  my XML files by dint of DOM or SAX , some 
> errors occurred. The Python xml packages can't do it now? Is there any way 
> can finish my job? How shall I do it? Please help me.

Limitations of the underlying parser, Expat, prevent certain encodings from
being supported without an additional layer of code. GB2312 is among them.

I think you will have to transcode your document to one of the encodings that
is supported by Expat (UTF-16, UTF-16LE, UTF-16BE, UTF-8, ISO-8859-1, or
US-ASCII; you probably want UTF-8 or UTF-16), and then either rewrite the
encoding declaration in the XML, or find a way to make the declaration
externally. Expat does support external declaration of encoding, but I don't
know offhand how to do it from Python.



More information about the XML-SIG mailing list