Is this a bug? BOM decoded with UTF8

"Martin v. Löwis" martin at v.loewis.de
Fri Feb 11 19:33:01 EST 2005


> What are you talking about? The BOM and UTF-16 go hand-and-hand. Without 
> a Byte Order Mark, you can't unambiguosly determine whether big or 
> little endian UTF-16 was used.

In the old days, UCS-2 was *implicitly* big-endian. It was only
when Microsoft got that wrong that little-endian version of UCS-2
came along. So while the BOM is now part of all relevant specifications,
it is still "Microsoft crap".

> For more details, see:
> http://www.unicode.org/faq/utf_bom.html#BOM

"some higher level protocols", "can be useful" - not
"is inherent part of all byte-level encodings".

Regards,
Martin



More information about the Python-list mailing list