Is this a bug? BOM decoded with UTF8
Brian Quinlan
brian at sweetapp.com
Fri Feb 11 08:51:41 EST 2005
Diez B. Roggisch wrote:
>>I know its easy (string.replace()) but why does UTF-16 do
>>it on its own then? Is that according to Unicode standard or just
>>Python convention?
>
>
> BOM is microsoft-proprietary crap. UTF-16 is defined in the unicode
> standard.
What are you talking about? The BOM and UTF-16 go hand-and-hand.
Without a Byte Order Mark, you can't unambiguosly determine whether big
or little endian UTF-16 was used. If, for example, you came across a
UTF-16 text file containing this hexidecimal data: 2200
what would you assume? That is is quote character in little-endian
format or that it is a for-all symbol in big-endian format?
For more details, see:
http://www.unicode.org/faq/utf_bom.html#BOM
Cheers,
Brian
More information about the Python-list
mailing list