[issue41220] add optional make_key argument to lru_cache
Raymond Hettinger
report at bugs.python.org
Tue Jul 7 05:23:27 EDT 2020
Raymond Hettinger <raymond.hettinger at gmail.com> added the comment:
Thanks, I see what you're trying to do now:
1) Given a slow function
2) that takes a complex argument
2a) that includes a hashable unique identifier
2b) and some unhashable data
3) Cache the function result using only the unique identifier
The lru_cache() currently can't be used directly because
all the function arguments must be hashable.
The proposed solution:
1) Write a helper function
1a) that hash the same signature as the original function
1b) that returns only the hashable unique identifier
2) With a single @decorator application, connect
2a) the original function
2b) the helper function
2c) and the lru_cache logic
A few areas of concern come to mind:
* People have come to expect cached calls to be very cheap, but it is easy to write input transformations that aren't cheap (i.e. looping over all the inputs as in your example or converting entire mutable structures to immutable structures).
* While key-functions are relatively well understood, when we use them elsewhere key-functions only get called once per element. Here, the lru_cache() would call the key function every time even if the arguments are identical. This will be surprising to some users.
* The helper function signature needs exactly match the wrapped function. Changes would need to be made in both places.
* It would be hard to debug if the helper function return values ever stop being unique. For example, if the timestamps start getting rounded to the nearest second, they will sporadically become non-unique.
* The lru_cache signature makes it awkward to add more arguments. That is why your examples had to explicitly specify a maxsize of 128 even though 128 is the default.
* API simplicity was an early design goal. Already, I made a mistake by accepting the "typed" argument which is almost never used but regularly causes confusion and affects learnability.
* The use case is predicated on having a large unhashable dataset accompanied by a hashable identifier that is assumed to be unique. This probably isn't common enough to warrant an API extension.
Out of curiosity, what are you doing now without the proposed extension?
As a first try, I would likely write a dataclass to be explicit about the types and about which fields are used in hashing and equality testing:
@dataclass(unsafe_hash=True)
class ItemsList:
unique_id: float
data: dict = field(hash=False, compare=False)
I expect that dataclasses like this will emerge as the standard solution whenever people need a mapping or dict to work with keys that have a mix of hashable and unhashable components. This will work with the lru_cache(), dict(), defaultdict(), ChainMap(), set(), frozenset(), etc.
----------
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue41220>
_______________________________________
More information about the Python-bugs-list
mailing list