crc32 to be used as hash

Weiguang Shi wgshi at namao.cs.ualberta.ca
Fri Nov 12 02:23:11 EST 2004


Tim Peters wrote:
>...
>together did very well -- which implies that they weren't facing a
>very demanding problem.
I don't know about that. But all the experiments were trace-driven and
showed CRC32 is an excellent hash function (for the problem). Since
there haven't been counter-results reported, we might be able to
reject the null hypothesis: CRC32 is _not_ good for hashing address
traces.

>
>There's no reason to suppose, a priori, that's got much to do with the
>32-bit hash distribution of strings-in-email PythonLabs tested.
> ...
>
And I didn't say it did. Different workloads.

It should be apparent by now I was trying to do the similar things the
cited works did. As I gladly found there is a CRC32 function, upon my
first program in Python, I was alerted by the reference page saying
it's not a good hash function and wondered for what particular
reasons.

> ...
>
>This isn't 100% reliable either, though, and you can fiddle the above
>to find subfields and inputs where crc32 does better than the builtin
>hash.  For example, in the above, if the hash is shifted right by 11
>before taking the last 21 bits, the builtin hash does horribly,
>filling only 8K bins, but crc32 doesn't do any worse then than it did
>before.

It's good to know. According to Jain's paper, CRC32 _is_ relatively
bit-wise random (again, for the network workload). That's the property
I'm after.

Thanks for the discussion and the program.

Weiguang



More information about the Python-list mailing list