don't need dictionary's keys - hash table?

Terry Hancock hancock at anansispaceworks.com
Thu Jul 13 00:29:34 EDT 2006


Nick Vatamaniuc wrote:
>  The original post just said that he wanted to index some values by
>  their urls and didn't really care about the urls themselves, so I
>  suggested that he just use the hash of the key as the key. The
>  dictionary will then take a hash of that [note that:
>  hash(string)=hash(hash(string)) ] then the dictionary will not keep
>  the reference to the urls and GC will collect them. So instead of
>  dic[urlstring']=value he will do dic[hash(urlstring)]=value. But then
>  to retrieve he will have to do old_value=dic[hash(urlstring]].

Note that it is trivial to catch collisions on entry and correct them:

key = hash(url_string)
i      = 0
if d.has_key(key):
   while d.has_key(key):
       i += 1
   d[key + repr(i)] = val
else:
   d[key] = val

or something along those lines. No need to keep the original string.

Obviously this is only efficient if collisions are rare.
   
I am a little surprised that hash(hash(s)) == hash(s), is that actually
true?

Cheers,
Terry

-- 
Terry Hancock (hancock at AnansiSpaceworks.com)
Anansi Spaceworks http://www.AnansiSpaceworks.com




More information about the Python-list mailing list