[issue1328] feature request: force BOM option

Adam Olsen report at bugs.python.org
Thu Nov 1 20:07:38 CET 2007


Adam Olsen added the comment:

The problem with "being tolerate" as you suggest is you lose the ability
to round-trip.  Read in a file using the UTF-8 signature, write it back
out, and suddenly nothing else can open it.

Conceptually, these signatures shouldn't even be part of the encoding;
they're a prefix in the file indicating which encoding to use.

Note that the BOM signature (ZWNBSP) is a valid code point.  Although it
seems unlikely for a file to start with ZWNBSP, if were to chop a file
up into smaller chunks and decode them individually you'd be more likely
to run into it.  (However, it seems general use of ZWNBSP is being
discouraged precisely due to this potential for confusion[1]).

In summary, guessing the encoding should never be the default.  Although
it may be appropriate in some contexts, we must ensure we emit the right
encoding for those contexts as well. [2]

[1] http://unicode.org/faq/utf_bom.html#38
[2] http://unicode.org/faq/utf_bom.html#28

----------
nosy: +rhamphoryncus

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1328>
__________________________________


More information about the Python-bugs-list mailing list