[Cython] Hash-based vtables

Stefan Behnel stefan_ml at behnel.de
Tue Jun 5 10:07:10 CEST 2012


Dag Sverre Seljebotn, 05.06.2012 00:07:
> The C FAQ says 'if you know the contents of your hash table up front you can devise a perfect hash', but no details, probably just hand-waving.
> 
> 128 bits gives more entropy for perfect hashing: some but not much since each shift r is hardwired to one 64 bit subset.

Perfect hashing can be done with any fixed size data set (although it's not
guaranteed to always be the most efficient solution). It doesn't matter if
you use 64 bits or 128 bits. If 4 bits is enough, go with that. The
advantage of perfect hashing of a fixed size data set is that the hash
table has no free space and a match is guaranteed to be exact.

However, the problem in this specific case is that the caller and the
callee do not agree on the same set of entries, so there may be collisions
during the lookup (of a potentially very large set of signatures) that were
not anticipated in the perfect hash table layout (of the much smaller set
of provided signatures). Perfect hashing works here as well, but it looses
one of its main advantage over other hashing schemes. You then have to
compare the entries exactly after the lookup in order to make sure that you
didn't run into a collision, thus loosing time again that you just won with
the hashing.

But at least you only have to do exactly one such comparison, so that's an
advantage over a hashing scheme that allows collisions also in the layout.
Maybe you can even handle mismatches more quickly by adding a dedicated
"empty" entry for them that most (all?) anticipated mismatches would hash to.

Stefan


More information about the cython-devel mailing list