[Tutor] hash value input

Dave Angel davea at ieee.org
Sat Jan 30 12:31:33 CET 2010


spir wrote:
> On Fri, 29 Jan 2010 08:23:37 -0800
> Emile van Sebille <emile at fenx.com> wrote:
>
>   
>>> So, how does python do this?
>>>  
>>>       
>> Start here...
>>
>> http://effbot.org/zone/python-hash.htm
>>     
>
> Great, thank you!
> From the above pointed page:
>
> ====For ordinary integers, the hash value is simply the integer itself (unless it’s -1).
>
> class int:
>     def __hash__(self):
>         value =elf
>         if value =-1:
>             value =-2
>         return value
> ====
> I'm surprised of this, for this should create as many indexes (in the underlying array actually holding the values) as there are integer keys. With possibly huge holes in the array. Actually, there will certainly be a predefined number of indexes N, and the integers be further "modulo-ed" N. Or what?
> I would love to know how to sensibly chose the number of indexes. Pointers welcome (my searches did not bring any clues on the topic).
>
>   
>> Emile
>>     
>
> Denis
> ________________________________
>
> la vita e estrany
>
> http://spir.wikidot.com/
>
>   
I haven't seen the sources, so I'm making an educated guess based on 
things I have seen. The index size grows as the size of the dictionary 
grows, and the formula is not linear. Neither are the sizes obvious 
powers of two or suchlike. I doubt if you have any control over it, 
however. The hash() function returns an int (32 bits on 32bit python), 
which is then converted to the bucket number, probably by a simple 
modulo function.

In the case of integers, it's the modulo which distributes the integers 
among the buckets. If all the integer keys are consecutive, then modulo 
distributes them perfectly. If they're random, then it'll usually work 
pretty well, but you could hit a pattern which puts lots of values in 
one bucket, and not many in the others. If the index size is 22, and all 
your numbers are multiple of 22, then it might degenerate to effectively 
one bucket.

BTW, the referenced article does have a contradiction. For a long int 
whose value is between 16 and 31 bits, the described approach will not 
generate the same hash as the int of the same value. So that 15 bit 
shift algorithm must have some other subtlety to it, perhaps only 
starting with bit 31 or so.

DaveA




More information about the Tutor mailing list