[Tutor] hash value input
Dave Angel
davea at ieee.org
Sat Jan 30 12:31:33 CET 2010
spir wrote:
> On Fri, 29 Jan 2010 08:23:37 -0800
> Emile van Sebille <emile at fenx.com> wrote:
>
>
>>> So, how does python do this?
>>>
>>>
>> Start here...
>>
>> http://effbot.org/zone/python-hash.htm
>>
>
> Great, thank you!
> From the above pointed page:
>
> ====For ordinary integers, the hash value is simply the integer itself (unless it’s -1).
>
> class int:
> def __hash__(self):
> value =elf
> if value =-1:
> value =-2
> return value
> ====
> I'm surprised of this, for this should create as many indexes (in the underlying array actually holding the values) as there are integer keys. With possibly huge holes in the array. Actually, there will certainly be a predefined number of indexes N, and the integers be further "modulo-ed" N. Or what?
> I would love to know how to sensibly chose the number of indexes. Pointers welcome (my searches did not bring any clues on the topic).
>
>
>> Emile
>>
>
> Denis
> ________________________________
>
> la vita e estrany
>
> http://spir.wikidot.com/
>
>
I haven't seen the sources, so I'm making an educated guess based on
things I have seen. The index size grows as the size of the dictionary
grows, and the formula is not linear. Neither are the sizes obvious
powers of two or suchlike. I doubt if you have any control over it,
however. The hash() function returns an int (32 bits on 32bit python),
which is then converted to the bucket number, probably by a simple
modulo function.
In the case of integers, it's the modulo which distributes the integers
among the buckets. If all the integer keys are consecutive, then modulo
distributes them perfectly. If they're random, then it'll usually work
pretty well, but you could hit a pattern which puts lots of values in
one bucket, and not many in the others. If the index size is 22, and all
your numbers are multiple of 22, then it might degenerate to effectively
one bucket.
BTW, the referenced article does have a contradiction. For a long int
whose value is between 16 and 31 bits, the described approach will not
generate the same hash as the int of the same value. So that 15 bit
shift algorithm must have some other subtlety to it, perhaps only
starting with bit 31 or so.
DaveA
More information about the Tutor
mailing list