[I18n-sig] UTF-8 and BOM

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Mon, 21 May 2001 16:40:56 +0200


> Thats true for Unicode strings.
> 
> However, a python plain string containing an encoded Unicode string
> (in *any* character encoding) is no different to a file here - its
> just a block-o-bytes.

The problem with that approach is that writing to a UTF-16-encoded
file (as obtained by codecs.open(filename, "w", encoding="utf-16"))
will put the BOM in front of every chunk of data as passed to .write().

That is an error, IMO, the stream writer should only put the BOM into
the beginning of the entire file.

Regards,
Martin