implement random selection in Python

Mon Nov 19 08:31:39 EST 2007

On 2007-11-17, Bruza <benruza at gmail.com> wrote:
> OOPS. I pressed the Send too fast.
>
> The problem w/ Boris's solution is that after repeated calling
> of randomPick(3,items), 'Jane' is not the most "frequent
> appearing" member in all the out list of 3 member lists...

How does this solution fair against your spec?

def sample_bp(seq, probabilities, k):
    """ Return k distinct random items from seq, with probabilities specified
    as weighted integers in a list.

    >>> random.seed(0)

    >>> sample_bp(['a', 'b'], [1, 5], 2)
    ['b', 'a']

    >>> sample_bp(['a', 'b', 'c'], [1, 5, 2], 3)
    ['b', 'a', 'c']

    """

    if k > len(seq):
        raise ValueError('sample size greater than population')
    probs = build_probs(probabilities)
    rv = []
    while k > 0:
        j = random_prob(probs)
        rv.append(probs[j][2])
        remove_prob(probs, j)
        k -= 1
    return [seq[i] for i in rv]

def build_probs(probabilities):
    """ Receives a list of integers, and returns list of ranges and original
    indices.

    >>> build_probs([8, 10, 7])
    [(0, 8, 0), (8, 18, 1), (18, 25, 2)]
    >>> build_probs([1, 5, 8, 2, 3, 7])
    [(0, 1, 0), (1, 6, 1), (6, 14, 2), (14, 16, 3), (16, 19, 4), (19, 26, 5)]

    """
    k = 0
    probs = []
    for i, p in enumerate(probabilities):
        if p < 0:
            raise ValueError('negative probability')
        probs.append((k, k+p, i))
        k = k+p
    return probs

def random_prob(probs):
    """ Return the index of a weighted random element of prob.

    >>> random.seed(0)

    >>> for x in xrange(20):
    ...     print random_prob(build_probs([1, 5, 8, 2, 3, 7])),
    5 5 2 2 2 2 5 2 2 3 5 2 2 5 4 2 5 5 5 5

    >>> random_prob([(0, 15, 0)])
    0

    """
    i = random.randrange(probs[-1][1])
    # Binary search for the element whose range contains i
    hi = len(probs)
    lo = 0
    while lo < hi:
        mid = (lo+hi)//2
        begin, end, _ = probs[mid]
        if i >= begin and i < end: return mid
        elif i >= end: lo = mid+1
        else: hi = mid

def remove_prob(probs, i):
    """ Remove element j from the probability list, adjusting ranges as needed.

    >>> prob = [(0, 12, 0), (12, 15, 1), (15, 25, 2)]
    >>> remove_prob(prob, 1)
    >>> prob
    [(0, 12, 0), (12, 22, 2)]

    """
    begin, end, _ = probs[i]
    diff = end - begin
    j = i+1
    while j < len(probs):
        begin, end, index = probs[j]
        probs[j] = (begin-diff, end-diff, index)
        j += 1
    del probs[i]

This was the most simple-minded approach I could think of, so it
might serve as a reference against a more efficient approach.
Although thorough testing of such a bizarre function boggles my
mind.

I toyed with sorting the probability list from greatest to
lowest, which would make a linear search fast, but unfortunately
it would slow down removing probabilities.

-- 
Neil Cerutti
I make love to pressure. --Stephen Jackson