don't need dictionary's keys - hash table?

Nick Vatamaniuc vatamane at gmail.com
Wed Jul 12 17:33:37 EDT 2006


Fred,

I never said there will be no collisions. For clarity, can you quote
the exact phrase where I mentioned that?

To say that there will be no collision is nonsense because the # of
possible long url strings is certainly larger than 2^32, applying the
pigeon hole principle you could easily show that there will be
collisions.

The point is to make collision as unlikely as possible for the human
readable text (that is make them as uniform as possible), that is why
creating good cryptographic hashes is not easy.  Not that the Python
hash function work as hard as an MD5 (it is probably a multiplication
and a modulo if I had to guess). But this is a topic beyond scope of
this thread.

The original post just said that he wanted to index some values by
their urls and didn't really care about the urls themselves, so I
suggested that he just use the hash of the key as the key. The
dictionary will then take a hash of that [note that:
hash(string)=hash(hash(string)) ]  then  the dictionary will not keep
the reference to the urls and GC will collect them. So instead of
dic[urlstring']=value he will do dic[hash(urlstring)]=value. But then
to retrieve he will have to do old_value=dic[hash(urlstring]].

Hopefully this make my point more clear,
Nick V.

Fredrik Lundh wrote:
> Nick Vatamaniuc wrote:
>
> > If you don't really want to store the keys, just use
> > dic[hash(key)]=value, this way the dictionary will have the same shape
> > and distribution of keys as dic[key]=value because
> > hash('abc')=hash(hash('abc'))  but the long string of actual keys are
> > not referenced by the dictionary.
>
> how come you're so sure that there will never be any collisions ?
> 
> </F>




More information about the Python-list mailing list