don't need dictionary's keys - hash table?

Fredrik Lundh fredrik at pythonware.com
Thu Jul 13 04:54:43 EDT 2006


kdotsky at gmail.com wrote:

>> depending on your application, a bloom filter might be a good enough:
>>
>>      http://en.wikipedia.org/wiki/Bloom_filter
>>
>
> Thanks (everyone) for the comments.  I like the idea of the bloom
> filter or using an md5 hash, since a rare collision will not be a
> show-stopper in my case.

btw, since my brain is on vacation, could anyone who already has all the necessary
background information in their head (Paul?) tell me if using a chopped-up MD5 or
SHA-1 (or whatever) digest as the hash functions for a bloom filter is a good idea...

i.e. something like:

    import sha

    try:
        all
    except NameError:
        def all(S):
             for x in S:
                 if not x:
                    return False
             return True

    def digest(data):
        d = sha.sha(data).digest()
        return [d[i:i+4] for i in range(0, len(d), 4)]

    class bloom(object):

        def __init__(self):
            self.filter = [set() for i in range(len(digest("")))]

        def add(self, data):
            for s, p in zip(self.filter, digest(data)):
                s.add(p)
        def __contains__(self, data):

            return all(p in s for s, p in zip(self.filter, digest(data)))

    b = bloom()
    b.add("showbiz")
    b.add("absolution")
    print "hullaballo" in b
    print "showbiz" in b
    print "origin of symmetry" in b

(and if this isn't a good idea, why it isn't).

</F> 






More information about the Python-list mailing list