hashing an array - howto

bearophileHUGS at lycos.com bearophileHUGS at lycos.com
Fri Sep 5 12:45:37 EDT 2008


Michael Palmer:
> why can't it be tuple already?

Because if the input list L has tuples and lists, they end having the
same hash value:

>>> L = [[1,2,3], (1,2,3)]
>>> hash(tuple(L[0])), hash(tuple(L[1]))
(-378539185, -378539185)

But it's a not much common situation, and few hash collision pairs
can't damage much, so I agree with you that my assert was useless.
This may solve that problem anyway:

hash(type(L)) ^ hash(tuple(L))

Generally a good hashing functions uses all the input information. If
you use tuple() you ignore part of the input information, that is the
type of L. So xor-ing hash(type(L)) you use that information too.


> you can discard the tuple, so the memory requirement is transient.

Right, but there's lot of GC action, it may slow down the code. So you
can start using hash(tuple(L)), but if later the code performance
comes out bad, you may try a different version that creates less
intermediate garbage.

Bye,
bearophile



More information about the Python-list mailing list