implement random selection in Python

Fri Nov 16 21:02:21 EST 2007

Maybe it would help to make your problem statement a litte rigorous so
we can get a clearer idea of whats required.

One possible formulation:

Given a list L of pairs of values, weightings: [ (v_0, w_0), (v_1,
w_1), ....], and some N between 1 and length(L)

you would like to randomly select a set of N (distinct) values, V,
such that for any ints i and j,

Prob (v_i is in V) / Prob (v_j is in V) = w_i / w_j

This matches your expectations for N = 1. Intuitively though, without
having put much thought into it, I suspect this might not be possible
in the general case.

You might then want to (substantially) relax thec ondition to

Prob (v_i is in V) >= Prob (v_j is in V) iff w_i >= w_j

but in that case its more an ordering of likelihoods rather than a
weighting, and doesn't guarantee the right behaviour for N = 1, so i
don't think thats really what you want.

I can't think of any other obvious way of generalising the behaviour
of the N = 1 case.

- Jordan

On Nov 17, 10:50 am, Bruza <benr... at gmail.com> wrote:
> On Nov 16, 4:47 pm, Bruza <benr... at gmail.com> wrote:
>
>
>
>
>
> > On Nov 16, 6:58 am, duncan smith <buzz... at urubu.freeserve.co.uk>
> > wrote:
>
> > > Bruza wrote:
> > > > I need to implement a "random selection" algorithm which takes a list
> > > > of [(obj, prob),...] as input. Each of the (obj, prob) represents how
> > > > likely an object, "obj", should be selected based on its probability
> > > > of
> > > > "prob".To simplify the problem, assuming "prob" are integers, and the
> > > > sum of all "prob" equals 100. For example,
>
> > > >    items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
>
> > > > The algorithm will take a number "N", and a [(obj, prob),...] list as
> > > > inputs, and randomly pick "N" objects based on the probabilities of
> > > > the
> > > > objects in the list.
>
> > > > For N=1 this is pretty simply; the following code is sufficient to do
> > > > the job.
>
> > > > def foo(items):
> > > >     index = random.randint(0, 99)
> > > >     currentP = 0
> > > >     for (obj, p) in items:
> > > >        currentP += w
> > > >        if currentP > index:
> > > >           return obj
>
> > > > But how about the general case, for N > 1 and N < len(items)? Is there
> > > > some clever algorithm using Python standard "random" package to do
> > > > the trick?
>
> > > I think you need to clarify what you want to do.  The "probs" are
> > > clearly not probabilities.  Are they counts of items?  Are you then
> > > sampling without replacement?  When you say N < len(items) do you mean N
> > > <= sum of the "probs"?
>
> > > Duncabn
>
> > I think I need to explain on the probability part: the "prob" is a
> > relative likelihood that the object will be included in the output
> > list. So, in my example input of
>
> >   items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
>
> > So, for any size of N, 'Tom' (with prob of 45) will be more likely to
> > be included in the output list of N distinct member than 'Mary' (prob
> > of 30) and much more likely than that of 'John' (with prob of 10).
>
> > I know "prob" is not exactly the "probability" in the context of
> > returning a multiple member list. But what I want is a way to "favor"
> > some member in a selection process.
>
> > So far, only Boris's solution is closest (but not quite) to what I
> > need, which returns a list of N distinct object from the input
> > "items". However, I tried with input of
>
> >    items = [('Mary',1), ('John', 1), ('Tom', 1), ('Jane', 97)]
>
> > and have a repeated calling of
>
> > Ben
>
> OOPS. I pressed the Send too fast.
>
> The problem w/ Boris's solution is that after repeated calling of
> randomPick(3,items), 'Jane' is not the most "frequent appearing"
> member in all the out list of 3 member lists...- Hide quoted text -
>
> - Show quoted text -