Multibyte Character Surport for Python

Martin v. Loewis martin at v.loewis.de
Sat May 11 09:34:47 EDT 2002


"Stephen J. Turnbull" <stephen at xemacs.org> writes:

>     Martin> That's how UTF-16 is specified.
> 
> The Unicode standard permits, but does not require, a BOM.

Factually, the Unicode standard does not recognize UTF-16 as a byte
encoding; it only recognizes it as a CEF, not as a CES (see TR#17).

UTF-16 as-a-CES is defined in RFC 2781, which, in section 3.3, says
that the BOM SHOULD be inserted if the CES UTF-16 is used.

Regards,
Martin



More information about the Python-list mailing list