Today's fun and educational Python recipe

Raymond Hettinger python at rcn.com
Wed May 4 17:53:35 EDT 2011


On May 4, 12:27 pm, Paul Rubin <no.em... at nospam.invalid> wrote:
> Raymond Hettinger <pyt... at rcn.com> writes:
> > Here's a 22-line beauty for a classic and amazing algorithm:
> >http://bit.ly/bloom_filter
>
> The use of pickle to serialize the keys is a little bit suspicious if
> there might be a reason to dump the filter to disk and re-use it in
> another run of the program.  Pickle representation might change between
> Python releases, for example.  It's just supposed to stay interoperable
> between versions, not necessarily bitwise-identical.
>
> Otherwise it's quite nice.  I'd suggest adding a .update() operation
> that adds keys from a user-supplied iterator.

I chose pickle because it solved the problem of turning arbitrary
objects into bytes which are needed as inputs to sha512.

It seems that this particular choice of hash function is distracting
some readers away from the interesting part of how a Bloom filter
works.

Since the choice of hash functions is completely arbitrary, I'm
thinking of substituting something a little more basic.  What do you
think of this alternative as a way to make it clearer that each
successive probe is uncorrelated, yet fully dependent on the key?

    def get_probes(self, key):
        hasher = Random(key).randrange
        num_words = len(self.arr)
        for _ in range(self.num_probes):
            array_index = hasher(num_words)
            bit_index = hasher(32)
            yield array_index, 1 << bit_index


Raymond
-------

follow my other python tips and recipes on twitter:  @raymondh



More information about the Python-list mailing list