identity = lambda x: x -- a Pythonic idiom?

Kragen Sitaker kragen at canonical.org
Tue Nov 20 16:24:49 EST 2001


"Andrew Dalke" <dalke at dalkescientific.com> writes:
> Then in the "sort <sequence> <predictate> &key (key #'identity)"
> line the '&key' says that field is 1) optional and 2) a keyword
> argument, and that if not given the identity function is used.
> ...
> Python only uses one function for this, which means the field
> extraction and data comparison must be composed into one function,

Well, not really; the advantage of having a separate key function is
that it only gets called once for each input item, rather than lg N
times, which can be very significant if extracting the key is
expensive and the comparison is comparatively fast.

You *can* do almost this in Python, using the following code:

tmplist = [(x.lower(), x) for x in mylist]
tmplist.sort()
mylist = [x[1] for x in tmplist]

On a 112-item list consisting of eight copies of a short list of
words, on my machine, the above method takes about 2.3 ms to sort,
while the method relying on an explicit comparison function takes
about 9.4 ms to sort, four times slower:

list = list[:]
list.sort(lambda x, y: cmp(x.lower(), y.lower()))
return list

(As an aside, what's the recommended way to get timings for things
like this?  I hacked up a Speed module, but I expect that someone else
had done a better one already.)

This trick of computing the keys first, once, is what the Perlites
call the Schwartzian Transform, but because Perl's references are
second-class citizens, you have to provide an explicit comparison
function anyway.

In both Perl and Python, using the built-in C comparison function is a
big win; adding lambda x, y: cmp(x, y) to the 2.3-ms function above
made it take 6.7 ms.  In Perl, you have to turn your sortable values
into strings (the Guttman-Rosler Transform) to sort them with the
builtin function; in Python, you can just use a tuple.

> The sort function is (was) also defined in terms of C-style
> three-way sorting.  It's been changed so that only '<' comparison
> is needed, but I don't know how that change affects the comparison
> function used.

In which Python?  My 2.1.1 documentation still claims the comparator
should return -1, 0, or 1.  (Also, is that really a win?  You'll have
to call the function twice to tell the difference between > and == ---
won't that add 50% to your sort times in the common case where the
Python comparator is the bottleneck?)





More information about the Python-list mailing list