pick randomly a certain fraction of numbers from a list

Alex Martelli aleaxit at yahoo.com
Sat Feb 15 12:49:08 EST 2003


Yuval wrote:

> Does anybody know a quick way of picking randomly x% of elements from a
> list. e.g. If I have a list of 50 elements, how do I randomly choose 2% of
> it? The trivial way would be to iterate with the random.choice() function.
> Is there any better way?

Yes, there's a surprisingly simple algorithm to "pick K items out of N"
where the N>=K items come in a sequence.  No "fractions", of course --
you always pick an *integer* number of items (there IS no way at all to
"pick a fractional number of items").

The basic idea is: if at any stage you must pick K more items and you
know there are N more coming, then pick the next one with a probability
of K/N.  If you do pick it also update the "numbers still to pick" of
course (decrement it by 1), and whether you pick it or not, decrement
the "number still to come".

It may not be intuitive, seems too simple.  But basically all you need
to show is that each item is INDEPENDENTLY picked with a probability
of K/N.  Now, that's trivially true for the first one.  What about the
second?  Well, if you HAVE picked the first one (K/N times), then the
second one will also be picked with probability (K-1)/(N-1); if you
haven't picked the first one (1-K/N times), the second one will be
picked with probability K/(N-1).  So, in all:

K/N * (K-1)/(N-1) + (N-K)/N * K/(N-1) =

[K(K-1) + (N-K)K ] / N(N-1) =

[K**2 - K + NK - K**2 ] / N(N-1) =

K (N-1) / N (N-1) =

K / N


You could formalize this into a proof by induction, but I'll leave it
at this "handwaving" level.  Point is, the Python solution is pretty
trivial given this algorithm, e.g (warning, untested code):


import random

def pickK(somelist, K):
    N = len(somelist)
    result = []

    for item in somelist:
        x = random.randrange(N)
        if x < K:
            result.append(item)
            K -= 1
        N -= 1

    return result


Alex





More information about the Python-list mailing list