Processing XML files in CJK encodings

Uche Ogbuji uche at ogbuji.net
Sat Oct 23 14:21:59 EDT 2004


gshibaya at gmail.com (gs) wrote in message news:<6ea3a4e5.0410211339.2395fa29 at posting.google.com>...
> Python gurus,
> 
> I need to parse XML files in CJK encodings like GB2312 and Ja in UTF-8.
> I was using xml.dom.minidom first. It works with Ja in UTF-8, but doesn't
> work with GB2312. An article says,
> 
> http://mail.python.org/pipermail/xml-sig/2003-December/010034.html
> 
> Then I tried xml.parsers.xmlproc. It works fine with GB2312, but now it
> doesn't work with Ja in UTF-8. Another article says,
> 
> http://mail.python.org/pipermail/xml-sig/2003-September/009802.html
> 
> Is there any way to parse both of them correctly?

You say "doesn't work".  Can you be more specific?

-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
A hands-on introduction to ISO Schematron -
http://www-106.ibm.com/developerworks/edu/x-dw-xschematron-i.html
Schematron abstract patterns -
http://www.ibm.com/developerworks/xml/library/x-stron.html
Wrestling HTML (using Python) -
http://www.xml.com/pub/a/2004/09/08/pyxml.html
Enterprise data goes high fashion -
http://www.adtmag.com/article.asp?id=10061
Principles of XML design: Considering container elements -
http://www-106.ibm.com/developerworks/xml/library/x-contain.html
Hacking XML Hacks - http://www-106.ibm.com/developerworks/xml/library/x-think26.html
A survey of XML standards -
http://www-106.ibm.com/developerworks/xml/library/x-stand4/



More information about the Python-list mailing list