Fast attribute/list item extraction

Mon Dec 1 05:11:16 EST 2003

[Peter Otten]
> > extract[1]
> > extract["key"]

[Robert Brewer]
> I'm having a hard time seeing the use cases, given that I find
> most of them more readable if done with list comprehensions
> or good ol' for loops.

Peter's post focused on implementation instead of the context.

For Py2.4, the list.sort() will have an optional key argument that encapsulates
the decorate/sort/undercorate pattern.  For example, here is the new fastest way
to have a case insensitive sort leaving the original case intact:

>>> words = 'The quick BROWN fox JumPed OVER the LAzy dog'.split()
>>> words.sort(key=str.lower)
>>> words
['BROWN', 'dog', 'fox', 'JumPed', 'LAzy', 'OVER', 'quick', 'The', 'the']

Though case insensitivity is straight-forward, the two most common cases need to
be coded with lambda:

records.sort(key=lambda r: r.someattr)     # sort on an attribute
records.sort(key=lamdda r: r[2])               # sort on a field number

Not only is the lambda unattractive, it is slow.  This situation is not unique
to sort(), it comes up with other functionals like map() and filter().  A new
itertool called groupby() also faces the same situation.

So, the idea was born to add a fast extract function maker to the operator
module:

from operator import extract
records.sort(key=extract('someattr'))
records.sort(key=extract(2))

The same form is usable in other contexts as well:

students.sort(key=extract('age'))
for age, agegroup in groupby(students, extract('age')):
    print 'Students of age ', age
    for student in agegroup:
        print '    ', student.name

The lastest incarnation of the extract() idea is have two separate extra
functions (one using getitem and the other using getattr) rather than one
overloaded function.

Raymond