case-insensitive and internationalized sort

Thu Dec 19 18:26:06 EST 2002

"Kevin Altis" <altis at semi-retired.com> writes:

> That looks good, but I'm a bit slow. I'm not seeing how to create just a
> compare function so I can sort in place. This is the closest I could I do
> and it requires a standalone function and the temporary creation of another
> list:

Correct. Proper sorting of strings often requires that you transform
them, both for correctness and speed. As a result, sorting won't be
in-place, unless you make the transformation in-place:

def caseinsensitive_inplace(list):
     for index, value in enumerate(list): # using 2.3's enumerate
         list[index] = value.upper(),value
     list.sort()
     for index, (_, value) in enumerate(list): # using 2.3's enumerate
         list[index] = value

> Is there a simpler way I'm just not seeing one that avoids a lot of
> extra upper() calls?

It appears you are not aware of what "a lot" is, here. This routine
uses n .upper calls, if len(list) is n. Doing the upper in the compare
function causes n*log(n) .upper calls on average, and, depending on
the Python version, n*n .upper calls in the worst case.

> The list creation overhead may be worse than extra upper
> calls. I didn't have any succes trying to incorporate locale.strcoll or
> locale.strxfrm. I kept getting an exception
> 
> UnicodeError: ASCII encoding error: ordinal not in range(128)

Ah, so you *are* trying to compare Unicode strings? You should encode
them to the locale's encoding first. How to find out what the locale's
encoding is depends a lot on the operating system you are using (and,
to a lesser degree, on the Python version).

Regards,
Martin