What is wrong? The minidom or the XML file?

Andrew Clover and-google at doxdesk.com
Thu Mar 11 11:36:41 EST 2004


Anthony Liu <antonyliu2002 at yahoo.com> wrote:

> by "transcode", do you mean something like:

> theTranscodedString = unicode(theOriginalString)
> xmldoc = minidom.parseString(theTranscodedString)

I don't think minidom.parseString is happy accepting a Unicode string, so
you'd have to transcode over to something else, presumably UTF-8 or UTF-16:

  utf8String= unicode(originalString, 'gb2312').encode('utf-8')
  document= minidom.parseString(utf8String)

problem is you'd have to remove the <?xml?> declaration so it didn't still
try to use the GB encoding declared there.

> So I'll try using xmlproc or pxdom, which I am assuming understand the
> GB encoding according to what you say, right?

Both of these just use the encodings available to Python, so, yes, once
CJKC is installed parsing shouldn't be a problem. The difference is that
expat (which minidom uses) is a separate library written in C, which makes
it much faster, but means it has no access to Python's codec base.

-- 
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/



More information about the Python-list mailing list