writing \feff at the begining of a file

Terry Reedy tjreedy at udel.edu
Fri Aug 13 18:25:46 EDT 2010


A short background to MRAB's answer which I will try to get right.

The byte-order-mark was invented for UTF-16 encodings so the reader 
could determine whether the pairs of bytes are in little or big endiean 
order, depending on whether the first two bute are fe and ff or ff and 
fe (or maybe vice versa, does not matter here). The concept is 
meaningless for utf-8 which consists only of bytes in a defined order. 
This is part of the Unicode standard.

However, Microsoft (or whoever) re-purposed (hijacked) that pair of 
bytes to serve as a non-standard indicator of utf-8 versus any 
non-unicode encoding. The result is a corrupted utf-8 stream that python 
accommodates with the utf-8-sig(nature) codec (versus the standard utf-8 
codec).
-- 
Terry Jan Reedy




More information about the Python-list mailing list