case-insensitive and internationalized sort

Kevin Altis altis at semi-retired.com
Thu Dec 19 16:01:53 EST 2002


The fact that the built-in list sort method is case-sensitive seems to be a
recurring topic. If a named parameter is going to be added to the sort
method, it would probably require a PEP and discussion on python-dev before
it was accepted. But since sort() doesn't do what a lot of people expect I
would like to discuss the issues here first.

This topic came out of a off-list discussion I've been having with Jarno J
Virtanen, who supplied the following case-insensitive function and test
code:

  def compare(a, b):
      return cmp(a.upper(), b.upper())

Now, if I test it with the following list:

  s = [u'ö', u'ä', u'Ä', 'b', 'a', 'B', u'a', 'A']
  s.sort(compare)
  for c in s:
      print c.encode('latin-1'),
  print

it yields:

  a a A b B ä Ä ö

You sometimes see people use a lambda instead of the compare function

  s.sort(lambda a, b: cmp(a.upper(), b.upper()))

Having a separate function may be easier to read, but are there speed
differences or other trade-offs?

Now the next point is that it would be nice to be able to get a
case-insensitive sort, which seems to be the most likely thing you want to
do when sorting strings. If an optional casesensitive arg was added to
sort() then without breaking any old code you could do:

s.sort(casesensitive=False)

By default, sort() would have to be case-sensitive to remain backwards
compatible.

The final point is that the solution above still doesn't handle the
characters a umlaut and A umlaut correctly. Some extra work is required for
a case-insensitive, "internationalized" sort that can handle ascii and
unicode. I would appreciate suggestions for how to do it.

Anyone up for a casesensitive flag addition to the sort() method?

ka





More information about the Python-list mailing list