Fast attribute/list item extraction

Raymond Hettinger vze4rx4y at verizon.net
Mon Dec 1 05:11:16 EST 2003


[Peter Otten]
> > extract[1]
> > extract["key"]

[Robert Brewer]
> I'm having a hard time seeing the use cases, given that I find
> most of them more readable if done with list comprehensions
> or good ol' for loops.

Peter's post focused on implementation instead of the context.

For Py2.4, the list.sort() will have an optional key argument that encapsulates
the decorate/sort/undercorate pattern.  For example, here is the new fastest way
to have a case insensitive sort leaving the original case intact:

>>> words = 'The quick BROWN fox JumPed OVER the LAzy dog'.split()
>>> words.sort(key=str.lower)
>>> words
['BROWN', 'dog', 'fox', 'JumPed', 'LAzy', 'OVER', 'quick', 'The', 'the']

Though case insensitivity is straight-forward, the two most common cases need to
be coded with lambda:

records.sort(key=lambda r: r.someattr)     # sort on an attribute
records.sort(key=lamdda r: r[2])               # sort on a field number

Not only is the lambda unattractive, it is slow.  This situation is not unique
to sort(), it comes up with other functionals like map() and filter().  A new
itertool called groupby() also faces the same situation.

So, the idea was born to add a fast extract function maker to the operator
module:

from operator import extract
records.sort(key=extract('someattr'))
records.sort(key=extract(2))

The same form is usable in other contexts as well:

students.sort(key=extract('age'))
for age, agegroup in groupby(students, extract('age')):
    print 'Students of age ', age
    for student in agegroup:
        print '    ', student.name

The lastest incarnation of the extract() idea is have two separate extra
functions (one using getitem and the other using getattr) rather than one
overloaded function.


Raymond









More information about the Python-list mailing list