implement random selection in Python

jehugaleahsa at gmail.com jehugaleahsa at gmail.com
Sat Nov 17 16:51:16 EST 2007


Knuth says to pick N distinct records from a collection where the
probability is equal you should:

first fill up N records by chosing the first seen.

if less than N were in the collection, quit.

otherwise, t = (the number of items looked at) or N to start.

while your not at the end of the collection:
   increment t
   generate a random number between 0 and t => K
   if K < N:
      add it to your collection at position K (replacing the previous
item).

Now, to incorporate probability is another question . . .

Find the largest probability. Assume it has 100% probability.
Calculate the probability of each item accordingly. Take the randomly
generated number and multiply it by the probability. Take the random
number minus the (random number times to probability). If it falls in
range, then you replace like usual. You should find that the max
probability will always appear in the zeroth position if it is not
replaced by another value. Now, this does not guarantee that the
highest probability value will show up first in the list, since that
is the same as sorting by the probability. It is just a way of
increasing the probability of making the value fall in the range as
the probability varies.

I am not guaranteeing this even works. I am seeing that there is some
collision among the numbers, but it will work for the most part.



More information about the Python-list mailing list