[I18n-sig] UTF-8 and BOM
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Thu, 17 May 2001 06:32:24 +0200
> Text data is different than binary data. Unicode text
> which uses combining characters (e.g. accent and 'e' to produce
> 'é') is equivalent to text which uses the combined character
> point directly.
Are you saying that the BOM is removed under normalization? Which
normalization form?
> You have to be careful here: UTF-16 prepends a BOM mark to
> every string pushed through the codec -- even small snippets.
That seems like an error also. When writing to a UTF-16 stream, I want
the BOM to appear only in the first bytes of the resulting file.
Regards,
Martin