[issue25325] UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE encodings don't add/remove BOM on encode/decode

eryksun report at bugs.python.org
Tue Oct 6 14:01:42 EDT 2015


eryksun added the comment:

Yes, if you explicitly use big-ending or little-endian UTF, then you need to manually include a BOM if that's required. That said, if a file format or data field is specified with a particular byte order, then using a BOM is strictly incorrect. See the UTF BOM FAQ:

    http://www.unicode.org/faq/utf_bom.html#BOM

For regular text documents, in which the byte order doesn't really matter, use the native byte order of your platform via UTF-16 or UTF-32. Also, instead of manually encoding strings, use the "encoding" parameter of the built-in open function, or io.open or codecs.open in Python 2. This only writes a single BOM, even when writing to a file multiple times.

----------
nosy: +eryksun
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue25325>
_______________________________________


More information about the Python-bugs-list mailing list