[Python-Dev] Internationalization Toolkit

Fred L. Drake, Jr. fdrake@acm.org
Fri, 12 Nov 1999 11:22:24 -0500 (EST)


M.-A. Lemburg writes:
 > The abbreviation BOM is quite common w/r to Unicode.

  Yes: "w/r to Unicode".  In sys, it's out of context and should
receive a more descriptive name.  I think using BOM in unicodec is
good.

 >   BOM_BE: '\376\377' 
 >     (corresponds to Unicode 0x0000FEFF in UTF-16 
 >      == ZERO WIDTH NO-BREAK SPACE)

  I'd also add BOM to be the same as sys.byte_order_mark.  Perhaps
even instead of sys.byte_order_mark (just to localize the areas of
code that are affected).

 > Note that Unicode sees big endian byte order as being "correct". The

  A lot of us do.  ;-)


  -Fred

--
Fred L. Drake, Jr.	     <fdrake@acm.org>
Corporation for National Research Initiatives