What is wrong? The minidom or the XML file?

Andrew Clover and-google at doxdesk.com
Wed Mar 10 20:22:30 EST 2004


Anthony Liu <antonyliu2002 at yahoo.com> wrote:

> The problem remains even if I try encoding="UTF-16" or
> encoding="GB2312" or encoding="GBK" in the xml
> document.

Indeed, expat doesn't understand some of the more complex (DBCS)
encodings such as GB.

In any case, you'll need CJKCodecs to get GB support, if you haven't
installed them already. (They'll be built-in in a forthcoming Python
version.) See http://cjkpython.i18n.org/

Then you'll need to either:

  - read in the file and transcode it before passing to expat
    via minidom.parseString, or,

  - use a pure-Python parser such as xmlproc (a validating parser)
    or the one in pxdom.

-- 
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/



More information about the Python-list mailing list