[Python-Dev] Unicode byte order mark decoding

"Martin v. Löwis" martin at v.loewis.de
Thu Apr 7 23:38:39 CEST 2005


Nicholas Bastin wrote:
> It would be nice if you could optionally specify that the codec would
> assume UTF-16BE if no BOM was present, and not raise UnicodeError in
> that case, which would preserve the current behaviour as well as allow
> users' to ask for behaviour which conforms to the standard.

Alternatively, the UTF-16BE codec could support the BOM, and do
UTF-16LE if the "other" BOM is found.

This would also support your usecase, and in a better way. The
Unicode assertion that UTF-16 is BE by default is void these
days - there is *always* a higher layer protocol, and it more
often than not specifies (perhaps not in English words, but
only in the source code of the generator) that the default should
by LE.

Regards,
Martin


More information about the Python-Dev mailing list