[I18n-sig] UTF-8 and BOM
Martin v. Loewis
martin@loewis.home.cs.tu-berlin.de
Thu, 17 May 2001 06:28:56 +0200
> "M.-A. Lemburg" wrote:
> >
> >...
> >
> > Note that BYTE ORDER MARK is only a comment for char point
> > '\ufeff'. The real name is: ZERO WIDTH NO-BREAK SPACE.
No, and yes. "BYTE ORDER MARK" is not in the comment field of the
database, but in the "Unicode 1.0 name" of the database.
[Paul]
> I'm not sure I buy that, but one could argue that a Zero width no-break
> space character is a legitimate character whether you can see it on a
> computer screen or not...but I don't care enough to make that argument.
I do. A reader must not remove the BOM, unless it is clearly meant to
indicate the encoding of a document.
Regards,
Martin