print UTF-8 file with BOM

"Martin v. Löwis" martin at v.loewis.de
Fri Dec 23 16:25:27 EST 2005


John Bauman wrote:
> UTF-8 shouldn't need a BOM, as it is designed for character streams, and 
> there is only one logical ordering of the bytes. Only UTF-16 and greater 
> should output a BOM, AFAIK. 

Yes and no. Yes, UTF-8 does not need a BOM to identify endianness. No,
usage of the BOM with UTF-8 is explicitly allowed in the Unicode specs
(so output of the BOM doesn't *have* to be restricted to UTF-16 and
greater), and the BOM has a well-defined meaning for UTF-8 (namely,
as the UTF-8 signature).

Regards,
Martin



More information about the Python-list mailing list