Does python's minidom support Chinese?

Anthony Liu antonyliu2002 at yahoo.com
Wed Mar 10 16:55:07 EST 2004


The following 4 lines of code parses an XML document
very well if the XML document contains only English
words.

But when I insert one Chinese character into the XML
document, then Python starts to complain when it hits
the Chinese character, saying that it is an invalid
token and thus it is not well-formed.

This is the complaint of Python:

ExpatError: not well-formed (invalid token): line 3,
column 7

line 3 and column 7 exactly pinpoints the 1st Chinese
character in the XML document.

The problem remains even if I try encoding="UTF-16" or
encoding="GB2312" or encoding="GBK" in the xml
document.

Note that GB2312 and GBK are Chinese encodings.

Please give a hint.  Thanks a lot!

The 4 lines of code I used is here:

# -*- coding: cp936 -*-
from xml.dom import minidom
xmldoc = minidom.parse('test.xml')
print xmldoc.toxml()



__________________________________
Do you Yahoo!?
Yahoo! Search - Find what you’re looking for faster
http://search.yahoo.com




More information about the Python-list mailing list