[issue41220] add optional make_key argument to lru_cache

Fri Jul 17 22:11:25 EDT 2020

Raymond Hettinger <raymond.hettinger at gmail.com> added the comment:

I've left this open for a week to collect comments.  I concur with Felipe that this should be the responsibility of the caller.  And I concur with Jim that the use cases are likely too rare to warrant shoehorning in the extra functionality.

For my part, I'm concerned that this would be bug bait.  The result of the function would need to be uniquely associated with a particular output, independent of the contents of the mutable arguments.  I don't think it is good design for function calls to be disregarding the part of the input that was originally used to compute the output — in a way that notion conflict with the core idea of the of the lru_cache giving the same outputs for the same inputs.

Likewise, debugging and exception handing would be complicated by having two underlying user functions.  Exceptions could arise from three sources: the make_key() function, the cached function, and the lru cache internals.

Further, if the make_key() function turns out not to be unique, the ensuring bugs would be hard to find.  In the OP's later example, if we get two distinct inputs which happen to have the same timestamp, the error would pass without warning and would be hard to reproduce.  Similarly, the make_key() function has to assume that the mutable portion of the data hasn't mutated, but a key feature of mutable data is that it can mutate.  That too would result in a hard to find bug.

Also, while the OP has a specific use case in mind, we can't know in advance how others will use make_key().  It could be misused to convert mutable data to immutable data, possibly resulting in a cached function that is slower than the original (caching mean() for example).  Or the make_key() function could had side-effects.  The OP's original "my_list[0]" example showed yet another way that bugs could arise. It is problematic that that particular bug might not have been caught by a test suite that didn't know to call the function twice with a common first argument but with differing subsequent arguments.

Another possible bug trap is that it would make it easier to accidentally cache mutable function results without getting a warning.  For example:
      @lru_cache(128, make_key = lambda target, size, uniq_id, data: uniq_id)
      def search(target, size, uniq_id, data):
           i = data.index(target)
           return data[i : i+size]    # <== bug because this is mutable if data is mutable

I think the suggestion was a neat idea (and thought provoking), but I think we should decline because 1) the use case is uncommon 2) because it makes the API harder to understand 3) because it is bug bait and 4) because the responsibility should probably be with the caller.

Thank you for the suggestion.  It was inspired.  Don't be discouraged — many of my proposals don't get accepted as well.  Please continue to submit suggestions.

P.S. make_key() shouldn't be called a key-function so that we don't make that term ambiguous.  Elsewhere key-functions are all about comparison logic rather than hashing.  They can be created by functools.cmp_to_key(), they tend to be used only once per data element, and the same function tends to be inoperable with all the tools that accept key-functions.  So the term make_key() was likely the better terminology.  Perhaps get_unique_id() would have been even less ambiguous.

----------
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41220>
_______________________________________