Comment on PEP 263 - Defining Python Source Code Encodings
Martin v. Loewis
martin at v.loewis.de
Sun May 12 11:17:16 EDT 2002
"Stephen J. Turnbull" <stephen at xemacs.org> writes:
> Which, sigh, actually violates the Unicode standard. (The standard
> requires that for UTF-8 and UTF-{16,32}{LE,BE} a leading ZERO-WIDTH
> NO-BREAK SPACE be considered exactly that, and it may not be
> filtered.)
Can you quote chapter and verse where it states that? In respect of
this text, how do you interpret
http://www.unicode.org/unicode/faq/utf_bom.html#25
which says
"No, a BOM can be used as a signature no matter how the Unicode text
is transformed: UTF-16, UTF-8, UTF-7, etc."
and the answer to question 29 adds
"Yes, UTF-8 can contain a BOM. However, it makes no difference as to
the endianness of the byte stream. UTF-8 always has the same byte
order. An initial BOM is only used as a signature -- an indication
that an otherwise unmarked text file is in UTF-8."
Regards,
Martin
More information about the Python-list
mailing list