XML can't read Unicode shock horror. News at 11.

Paul Prescod paulp at ActiveState.com
Thu Nov 1 16:21:35 EST 2001


Dale Strickland-Clark wrote:
> 
> ...
> 
> That's not much good if my XML document happens to start with:
> 
> <?xml version="1.0" encoding="UTF-16"?>
> 
> To quote from the O'Reilly book, "XML In A Nutshell" p71: "An XML
> parser is required to handle the UTF-16 and UTF-8 encodings or
> Unicode." And I expect similar is stated in the XML DOM spec if I had
> time to look for it.

As Martin says, you won't find anything like that in the DOM spec. And
the XML-spec is not going to provide much support for your position
either, because it discusses the parsing of *byte sequences*. I showed
you how to construct a UTF-8 byte sequence. You can also construct a
UTF-16 byte sequence using the same technique (change the string "UTF-8"
to "UTF-16"). If you want to write a function that creates the right
byte sequence no matter what the encoding declaration, you'd have to
sniff the encoding declaration.

 Paul Prescod




More information about the Python-list mailing list