Perfect hashing for Py

Casey Caseyweb at gmail.com
Fri Jul 11 19:46:41 EDT 2008


On Jul 11, 8:01 am, bearophileH... at lycos.com wrote:
> Following links from this thread:http://groups.google.com/group/comp.lang.python/browse_thread/thread/...
>
> I have found this perfect hash (minimal too) implementation:http://burtleburtle.net/bob/hash/perfect.html
>
> I have already translated part of it to D, and it seems to work well
> enough. As discussed in the PyConDue, I think this may be used in
> frozenset (and frozendict) to build a (minimal too?) perfect hash on
> the fly, to allow (hopefully) faster retrieval of items that don't
> change.
> That code is C and I think it's public domain, so if experiments show
> it gives enough speed up, it may be added to CPython 2.6/3.
>
> Bye,
> bearophile

It would be interesting to see if such an algorithm could actually
provide any significant performance improvements for the size of sets
that I suspect are most often used in practice. The chance of a hash
collision for a good 32-bit general is fairly low even for a set of
1,000,000 unique elements, which seems to me to be a pretty large
memory-based set.  Compare that with the cost of determining a perfect
hash (O(n**2)?).

From my perspective, a perfect hash would certainly be a welcome
addition to the Python library or even as an optional algorithm
supporting
hash-based collections.



More information about the Python-list mailing list