[issue1328] feature request: force BOM option

Adam Olsen report at bugs.python.org
Thu Nov 1 23:21:34 CET 2007


Adam Olsen added the comment:

On 11/1/07, James G. sack (jim) <report at bugs.python.org> wrote:
>
> James G. sack (jim) added the comment:
>
> Adam Olsen wrote:
> > Adam Olsen added the comment:
> >
> > The problem with "being tolerate" as you suggest is you lose the ability
> > to round-trip.  Read in a file using the UTF-8 signature, write it back
> > out, and suddenly nothing else can open it.
>
> I'm sorry, I don't see the round-trip problem you describe.
>
> If codec utf_8 or utf_8_sig were to accept input with or without the
> 3-byte BOM, and write it as currently specified without/with the BOM
> respectively, then _I_ can reread again with either utf_8 or utf_8_sig.
>
> No round trip problem _for me_.
>
> Now If I need to exchange with some else, that's a different matter. One
> way or another I need to know what format they need and create the
> output they require for their input.
>
> Am I missing something in your statement of a problem?

You don't seem to think it's important to interact with other
programs.  If you're importing with no intent to write out to a common
format, then yes, autodetecting the BOM is just fine.  Python needs a
more general default though, and not guessing is part of that.

> > Conceptually, these signatures shouldn't even be part of the encoding;
> > they're a prefix in the file indicating which encoding to use.
>
> Yes, I'm aware of that, but you can't predict what you may find in dusty
> archives, or what someone may give to you. IMO, that's the basis of
> being tolerant in what you accept, is it not?

Garbage in, garbage out.  There's a lot of protocols with whitespace,
capitalization, etc that you can fudge around while retaining the same
contents; character set encodings aren't one of them.

__________________________________
Tracker <report at bugs.python.org>
<http://bugs.python.org/issue1328>
__________________________________


More information about the Python-bugs-list mailing list