an element from a set

Bryan bryanjugglercryptographer at yahoo.com
Mon May 17 06:42:43 EDT 2010


Carl Banks wrote:
[...]
> Random element from a set is such a natural idea.
>
> Even if we were to modify the set type at the C level to support it, I
> can't think of an easy way to get a random element without selection
> bias.  For instance, if you start from a random hash code, some
> elements are more likely to be selected than others, depending on
> distance between entries in the hash table.  You'd probably have to
> choose an number in range(len(s)) and count off that many, but then
> might as well have just converted it to a list or used an iterator.  A
> little disappointing, actually.
>
> Probably the "right" way is to use a binary-tree-based set.

There's a reasonably straightforward solution supporting O(1) time
random choice that also preserves the O(1) average-case time of set's
add(), remove(), and 'in'. The trick is to keep each element in both a
list and a hash table. Implementing Python's entire set interface is a
bit of project, so the code below just supports enough for a demo.

-Bryan Olson

# --------

from random import choice

class SetWithRandom:

    def __init__(self, *args):
        self.lst = list(*args)
        self.dct = dict((elem, i) for (i, elem) in
                enumerate(self.lst))

    def audit(self):
        # Test consistency of dict and list.
        assert len(self.lst) == len(self.dct)
        for elem in self.dct:
            assert self.lst[self.dct[elem]] == elem

    def random(self):
        return choice(self.lst)

    def add(self, elem):
        if elem not in self.dct:
            self.dct[elem] = len(self.lst)
            self.lst.append(elem)

    def remove(self, elem):
        index = self.dct[elem]
        # Move the last item into vacated index.
        mover = self.lst[-1]
        self.lst[index] = mover
        self.dct[mover] = index
        del self.dct[elem]
        self.lst.pop()

    def __len__(self):
        return len(self.lst)

    def __contains__(self, elem):
        return elem in self.dct

    def __iter__(self):
        return self.dct.__iter__()


# ---------
# Test against Python's built-in set.
for i in range(0, 100, 20):
    trange = list(range(1000, 1000 + i))
    pyset = set(trange)
    testset = SetWithRandom(pyset)
    testset.audit()
    assert pyset == set(testset)
    rands = set(testset.random() for _ in trange)
    assert len(rands) >= i / 4
    while pyset:
        randx = testset.random()
        pyset.remove(randx)
        testset.remove(randx)
        testset.audit()
        for _ in range(choice([0, 0, 1, 2])):
            randx = choice(trange)
            pyset.add(randx)
            testset.add(randx)
        assert pyset == set(testset)
    assert not testset




More information about the Python-list mailing list