Comment on PEP 263 - Defining Python Source Code Encodings

Martin v. Loewis martin at v.loewis.de
Sun May 12 11:17:16 EDT 2002


"Stephen J. Turnbull" <stephen at xemacs.org> writes:

> Which, sigh, actually violates the Unicode standard.  (The standard
> requires that for UTF-8 and UTF-{16,32}{LE,BE} a leading ZERO-WIDTH
> NO-BREAK SPACE be considered exactly that, and it may not be
> filtered.)

Can you quote chapter and verse where it states that? In respect of
this text, how do you interpret

http://www.unicode.org/unicode/faq/utf_bom.html#25

which says

  "No, a BOM can be used as a signature no matter how the Unicode text
  is transformed: UTF-16, UTF-8, UTF-7, etc."

and the answer to question 29 adds

  "Yes, UTF-8 can contain a BOM. However, it makes no difference as to
  the endianness of the byte stream. UTF-8 always has the same byte
  order. An initial BOM is only used as a signature -- an indication
  that an otherwise unmarked text file is in UTF-8."

Regards,
Martin



More information about the Python-list mailing list