[Python-Dev] UTF-16 code point comparison

Finn Bock bckfnn@worldonline.dk
Wed, 26 Jul 2000 19:42:29 GMT


CPythons unicode compare function contains some code to compare surrogate
characters in code-point order (I think). This is properly a very neat
feature but is differs from java's way of comparing strings.

  Python 2.0b1 (#0, Jul 26 2000, 21:29:11) [MSC 32 bit (Intel)] on win32
  Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
  Copyright 1995-2000 Corporation for National Research Initiatives (CNRI)
  >>> print u'\ue000' < u'\ud800'
  1
  >>> print ord(u'\ue000') < ord(u'\ud800')
  0
  >>>


Java (and JPython) compares the 16-bit characters numericly which result in:

  JPython 1.1+08 on java1.3.0 (JIT: null)
  Copyright (C) 1997-1999 Corporation for National Research Initiatives
  >>> print u'\ue000' < u'\ud800'
  0
  >>> print ord(u'\ue000') < ord(u'\ud800')
  0
  >>>

I don't think I can come up with a solution that allow JPython to emulate
CPython on this type of comparison.

regards,
finn