heapq "key" arguments

Joshua Bronson jabronson at gmail.com
Thu Aug 6 21:10:39 EDT 2009


On Aug 3, 1:36 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> [Joshua Bronson]:
>
> > According tohttp://docs.python.org/library/heapq.html, Python 2.5
> > added an optional "key" argument to heapq.nsmallest and
> > heapq.nlargest. I could never understand why they didn't also add a
> > "key" argument to the other relevant functions (heapify, heappush,
> > etc).
>
> The problem is that heapq acts on regular lists, so it does not
> have exclusive access to the structure.  So, there is no reliable
> way for it to maintain a separate list of keys.  Since the keys
> can't be saved in the structure (without possibly breaking other
> code), the fine grained heapq functions (like heappop and heappush)
> would need to call key functions every time they are invoked.
> This is at odds with the implicit guarantee of the key function
> that it will be called no more than once per key.
>
> The overall problem is one of granularity.  A key function should
> be applied once in an initial pass, not on every call to a push/pop
> function.  The everyday solution that most people use is to operate
> on a list of (key, record) tuples and let tuple comparison do the
> work for you.  Another solution is to build a Heap class that does
> have exclusive access to the structure, but the API sugar often
> isn't worth the slightly weaker performance.
>
> One other thought.  Heaps are a lazy evaluation structure, so their
> fined-grained mutation functions only work well with just a single
> ordering function, so there is not need to have (and every reason
> to avoid) changing key functions in mid-stream.  IOW, the key
> function needs to be constant across all accesses.  Contrast this
> with other uses of key functions where it makes perfect sense
> to run minage=min(data, key=attrgetter('age')) and then running
> minsal=min(data, key=attrgetter('salary')).  The flexibility to
> change key functions just doesn't make sense in the context of
> the fine-grained heap functions.
>
> Accordingly, this is why I put key functions in nlargest() and
> nsmallest() but not in heappush() and friends.  The former can
> guarantee no more than one key function call per entry and they
> evaluate immediately instead of lazily.
>
> Raymond


I see, that makes sense. Thanks for the great explanation.

Josh



More information about the Python-list mailing list