[I18n-sig] UTF-8 and BOM

Paul Prescod paulp@ActiveState.com
Wed, 16 May 2001 15:26:56 -0700


"M.-A. Lemburg" wrote:
> 
>...
> 
> You have to be careful here: UTF-16 prepends a BOM mark to
> every string pushed through the codec -- even small snippets.
> You certainly don't want to make that the default for the
> much more common UTF-8 which has no real requirement to include
> BOM marks at all... having the decoder automatically remove
> BOM marks is easy to implement and won't cause any harm,
> but carelessly adding them will get us into trouble.

Yes, I meant to say that the standard decoder should remove them and
left it up to you whether we should have another codec where the encoder
adds them.

-- 
Take a recipe. Leave a recipe.  
Python Cookbook!  http://www.ActiveState.com/pythoncookbook