[Python-3000] Pre-PEP: Easy Text File Decoding

"Martin v. Löwis" martin at v.loewis.de
Mon Oct 2 22:20:01 CEST 2006


John S. Yates, Jr. schrieb:
> It is a mistake on Microsoft's part to fail to strip the BOM
> during conversion to UTF-8.  There is no MEANINGFUL definition
> of BOM in a UTF-8 string.

That's not true. See

http://unicode.org/faq/utf_bom.html#23
http://unicode.org/faq/utf_bom.html#29

The BOM can also serve as an encoding marker. I refer to the
BOM encoded in UTF-8 as "UTF-8 signature". As such, it is
very meaningful. Usage of the BOM in UTF-8-encoded text
is deliberate.

Regards,
Martin


More information about the Python-3000 mailing list