[I18n-sig] UTF-8 and BOM

Guido van Rossum guido@digicool.com
Mon, 21 May 2001 12:55:21 -0400


> > Then the write function has an error. A BOM should only be
> > written at the start of the file and not on every call to
> > write().
> 
> That's hard to implement... how would the codec know where the
> stream starts -- it only interfaces to the underyling stream
> using .read() and .write() ?

To me this looks like it should be an application issue.  The
application should write an explicit BOM at the start of each file it
writes.  The codecs shouldn't do anything with BOMs -- just pass them
through.

I'm pretty sure that's what the intention of BOMs in the Unicode
standard was, because it's the only reasonable approach -- if it
isn't, I'd like to see chapter and verse quoted. ;-)

--Guido van Rossum (home page: http://www.python.org/~guido/)