What is wrong? The minidom or the XML file?
Andrew Clover
and-google at doxdesk.com
Thu Mar 11 11:36:41 EST 2004
Anthony Liu <antonyliu2002 at yahoo.com> wrote:
> by "transcode", do you mean something like:
> theTranscodedString = unicode(theOriginalString)
> xmldoc = minidom.parseString(theTranscodedString)
I don't think minidom.parseString is happy accepting a Unicode string, so
you'd have to transcode over to something else, presumably UTF-8 or UTF-16:
utf8String= unicode(originalString, 'gb2312').encode('utf-8')
document= minidom.parseString(utf8String)
problem is you'd have to remove the <?xml?> declaration so it didn't still
try to use the GB encoding declared there.
> So I'll try using xmlproc or pxdom, which I am assuming understand the
> GB encoding according to what you say, right?
Both of these just use the encodings available to Python, so, yes, once
CJKC is installed parsing shouldn't be a problem. The difference is that
expat (which minidom uses) is a separate library written in C, which makes
it much faster, but means it has no access to Python's codec base.
--
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/
More information about the Python-list
mailing list