[Python-Dev] Unicode byte order mark decoding

Wed Apr 6 08:06:08 CEST 2005

Stephen J. Turnbull wrote:
> Of course it must be supported.  My point is that many strings (in my
> applications, all but those strings that result from slurping in a
> file or process output in one go -- example, not a statistically valid
> sample!) are not the beginning of "what once was a stream".  It is
> error-prone (not to mention unaesthetic) to not make that distinction.
> 
> "Explicit is better than implicit."

I can't put these two paragraphs together. If you think that explicit
is better than implicit, why do you not want to make different calls
for the first chunk of a stream, and the subsequent chunks?

>  >>> s=cStringIO.StringIO()
>  >>> s1=codecs.getwriter("utf-8")(s)
>  >>> s1.write(u"Hallo")
>  >>> s.getvalue()
> 'Hallo'
> 
> Yes!  Exactly (except in reverse, we want to _read_ from the slurped
> stream-as-string, not write to one)!  ... and there's no need for a
> utf-8-sig codec for strings, since you can support the usage in
> exactly this way.

However, if there is an utf-8-sig codec for streams, there is currently
no way of *preventing* this codec to also be available for strings. The
very same code is used for streams and for strings, and automatically
so.

Regards,
Martin