Processing XML files in CJK encodings

gs gshibaya at gmail.com
Thu Oct 21 17:39:27 EDT 2004


Python gurus,

I need to parse XML files in CJK encodings like GB2312 and Ja in UTF-8.
I was using xml.dom.minidom first. It works with Ja in UTF-8, but doesn't
work with GB2312. An article says,

http://mail.python.org/pipermail/xml-sig/2003-December/010034.html

Then I tried xml.parsers.xmlproc. It works fine with GB2312, but now it
doesn't work with Ja in UTF-8. Another article says,

http://mail.python.org/pipermail/xml-sig/2003-September/009802.html

Is there any way to parse both of them correctly?

Thanks,
-Gen



More information about the Python-list mailing list