[Python-Dev] Unicode byte order mark decoding

"Martin v. Löwis" martin at v.loewis.de
Wed Apr 6 22:22:19 CEST 2005


Stephen J. Turnbull wrote:
> Because the signature/BOM is not a chunk, it's a header.  Handling the
> signature/BOM is part of stream initialization, not translation, to my
> mind.

I'm sorry, but I'm losing track as to what precisely you are trying to
say. You seem to be using a mental model that is entirely different
from mine.

> The point is that explicitly using a stream shows that initialization
> (and finalization) matter.  The default can be BOM or not, as a
> pragmatic matter.  But then the stream data itself can be treated
> homogeneously, as implied by the notion of stream.

But what follows from that point? So it shows some kind of matter...
what does that mean for actual changes to Python API?

> I think it probably also would solve Walter's conundrum about
> buffering the signature/BOM if responsibility for that were moved out
> of the codecs and into the objects where signatures make sense.
> 
> I don't know whether that's really feasible in the short run---I
> suspect there may be a lot of stream-like modules that would need to
> be updated---but it would be a saner in the long run.

What is "that" which might be really feasible? To "solve Walter's
conundrum"? That "signatures make sense"?

So I can't really respond to your message in a meaningful way;
I just let it rest...

Regards,
Martin


More information about the Python-Dev mailing list