[Python-Dev] New Unicode Snapshot

M.-A. Lemburg mal@lemburg.com
Mon, 07 Feb 2000 16:19:55 +0100


Hi everybody,

I've just uploaded a new Unicode snapshot.

It includes a brand
new UTF-16 codec which is BOM mark aware, meaning that it recognizes
BOM marks on input and adjusts the byte order accordingly. On output
you can choose to have BOM marks written or specifically define
a byte order to use.

Also new in this snapshot is configuration code which figures
out the byte order on the installation machine... I looked
everywhere in the Python source code but couldn't find any
hint whether this was already done in some place, so I simply
added some autoconf magic to have two new symbols defined:

   BYTEORDER_IS_LITTLE_ENDIAN and BYTEORDER_IS_BIG_ENDIAN

(mutually exclusive of course).

BTW, I changed the hash method of Unicode objects to use the
UTF-8 string as basis for the hash code. This means that
u'abc' and 'abc' will now be treated as the same dictionary
key !

Some documentation also made into the snapshot. See the
file Misc/unicode.txt for all the interesting details about
the implementation.

Note that the web page provides a prepatched version of the
interpreter for your convenience... just download, run
./configure and make and your done. Could someone with
access to a MS VC compiler please update the project files
and perhaps post me some feedback about any glitches ?! I have
never compiled Python on Windows myself and don't have the time
to figure out just now :-/. Thanks :-)

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/