Best practice for caching hash

Cameron Simpson cs at cskk.id.au
Tue Mar 15 17:32:37 EDT 2022


On 12Mar2022 21:45, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
>I have a custom immutable object, and I added a cache for its hash
>value. The problem is the object can be composed of mutable or
>immutable objects, so the hash can raise TypeError.

Is it sensible to compute the hash only from the immutable parts?  
Bearing in mind that usually you need an equality function as well and 
it may have the same stability issues.

>In this case I currently cache the value -1. The subsequent calls to
>__hash__() will check if the value is -1. If so, a TypeError is
>immediately raised.

This will also make these values behave badly in dicts/sets, as they all 
hash to the same bucket. The performance of a hash index is pretty 
dependent on having roughly even distribution of objects amongst the 
hash slots.

>The problem is the first time I get an error with details, for example:
>
>TypeError: unhashable type: 'list'
>
>The subsequent times I simply raise a generic error:
>
>TypeError

You could, you know, cache the original exception. That does keep links 
to the traceback and therefore all the associates stack frames, so that 
isn't cheap (it is peerfectly fast - just the reference, t just prevents 
those things from being reclaimed).

>Ok, I can improve it by raising, for example, TypeError: not all
>values are hashable. But do you think this is acceptable? Now I'm
>thinking about it and it seems a little hacky to me.

That's a policy issue. "Acceptable" depends on the imagined use cases.  
Im' just having a read of your intro: 
https://github.com/Marco-Sulla/python-frozendict/blob/35611f4cd869383678104dc94f82aa636c20eb24/README.md

So your objective is that a frozendict can itself be hashable, allowing, 
say, sets of frozendicts?

In that case I would be inclined to never raise TypeError at all. I'd 
compute the hash entirely from the keys of the dict and compute equality 
in the normal fashion: identical keys and then equal corresponding 
values. That removes the requirement that values be immutable and/or 
hashable.

It that workable?

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list