Best practice for caching hash

Marco Sulla Marco.Sulla.Python at gmail.com
Wed Mar 16 15:02:27 EDT 2022


On Wed, 16 Mar 2022 at 09:11, Chris Angelico <rosuav at gmail.com> wrote:
> Caching the hash of a
> string is very useful; caching the hash of a tuple, not so much; again
> quoting from the CPython source code:
>
> /* Tests have shown that it's not worth to cache the hash value, see
>    https://bugs.python.org/issue9685 */

This is really interesting. Unluckily I can't use the pyperformance
benchmarks. I should use the code that uses frozendict, but I suppose
it's really hard...
Anyway this discourages me to continue to store unashable value, since
I should also store the error message. Storing only the hash when the
object is hashable is much cheaper, and maybe the extra field is not
so much a problem, since dict consumes more space than a tuple:

>>> sys.getsizeof({})
64
>>> sys.getsizeof(())
40
>>> sys.getsizeof({1:1})
232
>>> sys.getsizeof((1,))
48

> I don't know what use-cases frozendicts have, but I would
> suspect that if they are used at all, they'll often be used in cases
> where their keys are identical (for instance, the __dict__ of an
> immutable object type, where the keys are extremely stable across
> objects of the same type).

Well, I tried to implement them as dicts with shared keys, but I
abandoned it when Inada optimized dict(another_dict), where
another_dict is a compact dict. Since it's compact, you have "only" to
memcopy the entries (oversimplification).

I tried to do the same trick for the sparse dict structure, but
memcopy the keys and the values was not enough. I had to incref all
value *two* times and this slowed down the creation a lot. So I
decided to move to compact structure.


More information about the Python-list mailing list