[Python-Dev] UTF-16 code point comparison

M.-A. Lemburg mal@lemburg.com
Mon, 31 Jul 2000 11:16:56 +0200


Guido van Rossum wrote:
> 
> > Predicting the future can be difficult, but here is my take:
> > javasoft will never change the way String.compareTo works.
> > String.compareTo is documented as:
> > """
> >   Compares two strings lexicographically. The comparison is based on
> >   the Unicode value of each character in the strings. ...
> > """
> 
> (Noting that their definition of "character" is probably "a 16-bit
> value of type char", and has only fleeting resemblance to what is or
> is not defined as a character by the Unicode standard.)
> 
> > Instead they will mark it as a very naive string comparison and suggest
> > users to use the Collator classes for anything but the simplest cases.
> 
> Without having digested the entire discussion, this sounds like a good
> solution for Python too.  The "==" operator should compare strings
> based on a simple-minded representation-oriented definition, and all
> the other stuff gets pushed into separate methods or classes.

This would probably be the best way to go: we'll need
collation routines sooner or later anyway. Bill's "true UCS-4"
compare could then become part of that lib.

Should I #if 0 the current implementation of the UCS-4 compare
in CVS ?

-- 
Marc-Andre Lemburg
______________________________________________________________________
Business:                                      http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/