python 3's adoption

Alf P. Steinbach alfps at start.no
Fri Jan 29 01:10:01 EST 2010


* Steve Holden:
>>
> While I am fully aware that premature optimization, etc., but I cannot
> resist an appeal to efficiency if it finally kills off this idea that
> "they took 'cmp()' away" is a bad thing.
> 
> Passing a cmp= argument to sort provides the interpreter with a function
> that will be called each time any pair of items have to be compared. The
> key= argument, however, specifies a transformation from [x0, x1, ...,
> xN] to [(key(x0), x0), (key(x1), x1), ..., (key(xN), xN)] (which calls
> the key function precisely once per sortable item).
> 
>>From a C routine like sort() [in CPython, anyway] calling out from C to
> a Python function to make a low-level decision like "is A less than B?"
> turns out to be disastrous for execution efficiency (unlike the built-in
> default comparison, which can be called directly from C in CPython).
> 
> If your data structures have a few hundred items in them it isn't going
> to make a huge difference. If they have a few million thenit is already
> starting to affect performance ;-)

It's not either/or, it's do programmers still need the cmp functionality?

Consider that *correctness* is a bit more important than efficiency, and that 
sorting strings is quite common...

Possibly you can show me the way forward towards sorting these strings (shown 
below) correctly for a Norwegian locale. Possibly you can't. But one thing is 
for sure, if there was a cmp functionality it would not be a problem.


<example>
   Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] 
on win32
   Type "help", "copyright", "credits" or "license" for more information.
   >>> L = ["æ", "ø", "å"]   # This is in SORTED ORDER in Norwegian
   >>> L
   ['æ', 'ø', 'å']
   >>> L.sort()
   >>> L
   ['å', 'æ', 'ø']
   >>>
   >>> import locale
   >>> locale.getdefaultlocale()
   ('nb_NO', 'cp1252')
   >>> locale.setlocale( locale.LC_ALL )  # Just checking...
   'C'
   >>> locale.setlocale( locale.LC_ALL, "" )  # Setting default locale, Norwgian.
   'Norwegian (Bokmål)_Norway.1252'
   >>> locale.strxfrm( "æøå" )
   'æøå'
   >>> L.sort( key = locale.strxfrm )
   >>> L
   ['å', 'æ', 'ø']
   >>> locale.strcoll( "å", "æ" )
   1
   >>> locale.strcoll( "æ", "ø" )
   -1
   >>>
</example>


Note that strcoll correctly orders the strings as ["æ", "ø", "å"], that is, it 
would have if it could have been used as cmp function to sort (or better, to a 
separate routine named e.g. custom_sort).

And strcoll can be so used in 2.x:


<example>
C:\Documents and Settings\Alf\test> py2
Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit (Intel)] on 
win32
Type "help", "copyright", "credits" or "license" for more information.
 >>> def show( list ):
...     print "[" + ", ".join( list ) + "]"
...
 >>> L = [u"æ", u"ø", u"å"]
 >>> show( L )
[æ, ø, å]
 >>> L.sort()
 >>> show( L )
[å, æ, ø]
 >>> import locale
 >>> locale.setlocale( locale.LC_ALL, "" )
'Norwegian (Bokm\xe5l)_Norway.1252'
 >>> L.sort( cmp = locale.strcoll )
 >>> show( L )
[æ, ø, å]
 >>> L
[u'\xe6', u'\xf8', u'\xe5']
 >>> _
</example>


The above may just be a bug in the 3.x stxfrm. But it illustrates that sometimes 
you have your sort order defined by a comparision function. Transforming that 
into a key can be practically impossible (it can also be quite inefficient).


Cheers & hth.,

- Alf



More information about the Python-list mailing list