Perfect hashing for Py

Raymond Hettinger python at rcn.com
Sat Jul 12 10:11:35 EDT 2008


On Jul 12, 10:13 am, Raymond Hettinger <pyt... at rcn.com> wrote:
> On Jul 11, 3:01 pm, bearophileH... at lycos.com wrote:
>
> > I have found this perfect hash (minimal too) implementation:http://burtleburtle.net/bob/hash/perfect.html
>
> > I have already translated part of it to D, and it seems to work well
> > enough. As discussed in the PyConDue, I think this may be used in
> > frozenset (and frozendict) to build a (minimal too?) perfect hash on
> > the fly, to allow (hopefully) faster retrieval of items that don't
> > change.

I played around with the idea and have a rough implementation sketch:

class PerfectSet(collections.Set):
    __slots__ = ['len', 'hashvalue', 'hashfunc', 'hashtable']
    DUMMY = object()
    def __init__(self, keys, sparseness=0.5):
        keys = frozenset(keys)
        self.len = len(keys)
        self.hashvalue = hash(keys)
        n = (len(keys) / sparseness) + 1
        assert n > self.len
        self.hashfunc = make_perfect_hash_function(keys, n)
        self.hashtable = [self.DUMMY] * n
        for k in keys:
            self.hashtable[self.hashfunc(k)] = k
    del __len__(self, k):
        return self.len
    def __iter__(self, k)
        return (k for k in self.hashtable if k is not self.DUMMY)
    def __contains__(self, k):
        return self.table[self.hashfunc(k)] is not self.DUMMY

The perfect hash function will need to use the regular hash(obj) as a
starting point.  This helps with non-string types and it preserves
existing relationships between __hash__ and __eq__.

The construction is more expensive than with regular frozensets so it
is only useful when lookups are very common.

Playing around with the idea suggests that a sparseness variable for
frozensets would largely accomplish the same goal without the overhead
of creating and applying a perfect hash function.  Sparse hash tables
rarely collide.


Raymond



More information about the Python-list mailing list