[Python-Dev] Internationalization Toolkit

Tim Peters tim_one@email.msn.com
Tue, 16 Nov 1999 01:41:44 -0500


[MAL]
>   BOM_BE: '\376\377'
>     (corresponds to Unicode 0x0000FEFF in UTF-16
>      == ZERO WIDTH NO-BREAK SPACE)

[Greg Stein]
> Are you sure about that interpretation? I thought the BOM characters
> (0xFEFF and 0xFFFE) were *reserved* in the UCS-2 space.

I can't speak to MAL's degree of certainty <wink>, but he's right about this
stuff.  There is only one BOM character, U+FEFF, which is the zero-width
no-break space.  The byte-swapped form is not only reserved, it's guaranteed
never to be assigned to a character.