XML can't read Unicode shock horror. News at 11.

Dale Strickland-Clark dale at riverhall.NOTHANKS.co.uk
Thu Nov 1 05:51:49 EST 2001


Paul Prescod <paulp at ActiveState.com> wrote:

>Dale Strickland-Clark wrote:
>> 
>> ...
>> 
>> Is there any chance that this might be elevated?
>> 
>> Non-unicode XML is a bit restrictive. :-(
>
>I think Martin was trying to make the point that this works okay:
>
>dom = xml.dom.minidom.parseString(u'<node/>'.encode("utf-8"))
>
>I agree with you that minidom should probably do this automatically.
>
> Paul Prescod

That's not much good if my XML document happens to start with:

<?xml version="1.0" encoding="UTF-16"?>

To quote from the O'Reilly book, "XML In A Nutshell" p71: "An XML
parser is required to handle the UTF-16 and UTF-8 encodings or
Unicode." And I expect similar is stated in the XML DOM spec if I had
time to look for it.
--
Dale Strickland-Clark
Riverhall Systems Ltd



More information about the Python-list mailing list