Best practice for caching hash

Marco Sulla Marco.Sulla.Python at gmail.com
Sat Mar 12 15:45:56 EST 2022


I have a custom immutable object, and I added a cache for its hash
value. The problem is the object can be composed of mutable or
immutable objects, so the hash can raise TypeError.

In this case I currently cache the value -1. The subsequent calls to
__hash__() will check if the value is -1. If so, a TypeError is
immediately raised.

The problem is the first time I get an error with details, for example:

TypeError: unhashable type: 'list'

The subsequent times I simply raise a generic error:

TypeError

Ok, I can improve it by raising, for example, TypeError: not all
values are hashable. But do you think this is acceptable? Now I'm
thinking about it and it seems a little hacky to me.

Furthermore, in the C extension I have to define another property in
the struct, ma_hash_calculated, to track if the hash value is cached
or not, since there's no bogus value I can use in cache property,
ma_hash, to signal this. If I don't cache unhashable values, -1 can be
used to signal that ma_hash contains no cached value.

So if I do not cache if the object is unhashable, I save a little
memory per object (1 int) and I get a better error message every time.

On the other hand, if I leave the things as they are, testing the
unhashability of the object multiple times is faster. The code:

try:
    hash(o)
except TypeError:
    pass

execute in nanoseconds, if called more than 1 time, even if o is not
hashable. Not sure if this is a big advantage.

What do you think about? Here is the python code:
https://github.com/Marco-Sulla/python-frozendict/blob/35611f4cd869383678104dc94f82aa636c20eb24/frozendict/src/3_10/frozendictobject.c#L652-L697


More information about the Python-list mailing list