hashing an array - howto

John Machin sjmachin at lexicon.net
Fri Sep 5 16:58:00 EDT 2008


On Sep 6, 2:45 am, bearophileH... at lycos.com wrote:
> Michael Palmer:
>
> > why can't it be tuple already?
>
> Because if the input list L has tuples and lists, they end having the
> same hash value:

Perhaps the OP shouldn't be constructing the hash of a mutable object.
Perhaps he would be grateful if his hash function raised an exception
instead of laboriously masking the problem.

>
> >>> L = [[1,2,3], (1,2,3)]
> >>> hash(tuple(L[0])), hash(tuple(L[1]))
>
> (-378539185, -378539185)
>
> But it's a not much common situation, and few hash collision pairs
> can't damage much, so I agree with you that my assert was useless.
> This may solve that problem anyway:
>
> hash(type(L)) ^ hash(tuple(L))

Consider this:
>>> hash(123) == hash(123.0) == hash(123L)
True

Perhaps the OP (who hasn't stated what he is going to use the hash
results for) needs to use only the values in his hash, and would be if
not highly delighted then at least blithely unconcerned if it turned
out that [1, 2, 3] and (1, 2, 3) had the same hash.

>
> Generally a good hashing functions uses all the input information. If
> you use tuple() you ignore part of the input information, that is the
> type of L. So xor-ing hash(type(L)) you use that information too.
>

Try "uses all the information that is relevant to the task".

Your alternative solution using reduce and xor may have suboptimal
characteristics ... xor_hash((1, 2, 3)) == xor_hash((1, 3, 2)) ==
xor_hash((2, 1, 3)) etc. While the docs for __hash__ say "it is
advised to somehow mix together (e.g., using exclusive or) the hash
values for the components of the object", in practice "somehow" is
rather more elaborate than xor. Have a look at the tuplehash function
in .../Objects/tupleobject.c. If the order of the values in the tuple
doesn't matter, then perhaps the OP really should be using a set (or a
bag).

Cheers,
John



More information about the Python-list mailing list