[XML-SIG] unicode

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Fri, 10 Aug 2001 08:41:05 +0200


> xml.dom.minidom.parseString() doesn't accept unicode input--is this
> a bug?

I'd say it is a well-known limitation. In addition, it is questionable
what the parser should do if you have, say

u"<?xml version='1.0' encoding='koi8-r'><foo/>"

In this case, the document claims to use some encoding, but it is
actually represented as a Unicode string.

If you have the need to parse Unicode strings, I'd recommend to encode
them first. If you have an encoding declaration in the document, you
should encode them using that declaration; otherwise you should encode
them as UTF-8.

If you can come up with a patch that gets this right, it would be much
appreciated.

Regards,
Martin