Determining number of dict key collisions in a dictionary

Roger Binns rogerb at rogerbinns.com
Tue Dec 2 14:38:49 EST 2008


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

python at bdurham.com wrote:
> Background: I'm working on a project using very large dictionaries (64
> bit Python) and question from my client is how effective is Python's
> default hash technique for our data set? 

Python hash functions return a long which in a 64 bit process is 32 bits
on Windows and 64 bits on pretty much every other 64 bit environment.

> Their concern is based on the
> belief that Python's default dictionary hash scheme is optimized for 32
> bit vs. 64 bit environments and may not have anticipated the additional
> range of keys that can be generated in a 64 bit environment. Our keys
> are based on 20 to 44 byte ASCII (7-bit) alpha-numeric strings.

Why not have them look at the source code?  It is well commented and
there is another file with various notes.  Look at Objects/dictobject.c
and Objects/dictnotes.txt

A teaser comment for you:

   Most hash schemes depend on having a "good" hash function, in
   the sense of simulating randomness.  Python doesn't.

Roger
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkk1jsUACgkQmOOfHg372QTeEQCeJwkRphiPeDefkANg1IdG3HH1
oocAoICJk6NGxVmtZTZtLOL4Sv4aCw1n
=IqsO
-----END PGP SIGNATURE-----




More information about the Python-list mailing list