implement random selection in Python

Fri Nov 16 22:30:46 EST 2007

How about this variation on your intial attempt?

# Untested!
def randomPick(n, items):
  def pickOne():
    index = random.randint(0, 99)
    currentP = 0
    for (obj, p) in items:
      currentP += p
      if currentP > index:
        return obj
  selection = set()
  while len(selection) < n:
    selection.add(pickOne())
  return selection

Of course the performance is likely to be far from stellar for, say,
randomPick(2, [("Bob", 98), ("Jane", 1), ("Harry", 1)]), but at least
its boundedly bad (so long as you retain the condition that the sum of
the weightings is 100.)

Not sure if it satisfies the conditions in my last post.... either do
some empirical testing, or some mathematics, or maybe a bit of
both.

On Nov 17, 12:02 pm, Jordan <jordanrastr... at gmail.com> wrote:
> Maybe it would help to make your problem statement a litte rigorous so
> we can get a clearer idea of whats required.
>
> One possible formulation:
>
> Given a list L of pairs of values, weightings: [ (v_0, w_0), (v_1,
> w_1), ....], and some N between 1 and length(L)
>
> you would like to randomly select a set of N (distinct) values, V,
> such that for any ints i and j,
>
> Prob (v_i is in V) / Prob (v_j is in V) = w_i / w_j
>
> This matches your expectations for N = 1. Intuitively though, without
> having put much thought into it, I suspect this might not be possible
> in the general case.
>
> You might then want to (substantially) relax thec ondition to
>
> Prob (v_i is in V) >= Prob (v_j is in V) iff w_i >= w_j
>
> but in that case its more an ordering of likelihoods rather than a
> weighting, and doesn't guarantee the right behaviour for N = 1, so i
> don't think thats really what you want.
>
> I can't think of any other obvious way of generalising the behaviour
> of the N = 1 case.
>
> - Jordan
>
> On Nov 17, 10:50 am, Bruza <benr... at gmail.com> wrote:
>
>
>
> > On Nov 16, 4:47 pm, Bruza <benr... at gmail.com> wrote:
>
> > > On Nov 16, 6:58 am, duncan smith <buzz... at urubu.freeserve.co.uk>
> > > wrote:
>
> > > > Bruza wrote:
> > > > > I need to implement a "random selection" algorithm which takes a list
> > > > > of [(obj, prob),...] as input. Each of the (obj, prob) represents how
> > > > > likely an object, "obj", should be selected based on its probability
> > > > > of
> > > > > "prob".To simplify the problem, assuming "prob" are integers, and the
> > > > > sum of all "prob" equals 100. For example,
>
> > > > >    items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
>
> > > > > The algorithm will take a number "N", and a [(obj, prob),...] list as
> > > > > inputs, and randomly pick "N" objects based on the probabilities of
> > > > > the
> > > > > objects in the list.
>
> > > > > For N=1 this is pretty simply; the following code is sufficient to do
> > > > > the job.
>
> > > > > def foo(items):
> > > > >     index = random.randint(0, 99)
> > > > >     currentP = 0
> > > > >     for (obj, p) in items:
> > > > >        currentP += w
> > > > >        if currentP > index:
> > > > >           return obj
>
> > > > > But how about the general case, for N > 1 and N < len(items)? Is there
> > > > > some clever algorithm using Python standard "random" package to do
> > > > > the trick?
>
> > > > I think you need to clarify what you want to do.  The "probs" are
> > > > clearly not probabilities.  Are they counts of items?  Are you then
> > > > sampling without replacement?  When you say N < len(items) do you mean N
> > > > <= sum of the "probs"?
>
> > > > Duncabn
>
> > > I think I need to explain on the probability part: the "prob" is a
> > > relative likelihood that the object will be included in the output
> > > list. So, in my example input of
>
> > >   items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
>
> > > So, for any size of N, 'Tom' (with prob of 45) will be more likely to
> > > be included in the output list of N distinct member than 'Mary' (prob
> > > of 30) and much more likely than that of 'John' (with prob of 10).
>
> > > I know "prob" is not exactly the "probability" in the context of
> > > returning a multiple member list. But what I want is a way to "favor"
> > > some member in a selection process.
>
> > > So far, only Boris's solution is closest (but not quite) to what I
> > > need, which returns a list of N distinct object from the input
> > > "items". However, I tried with input of
>
> > >    items = [('Mary',1), ('John', 1), ('Tom', 1), ('Jane', 97)]
>
> > > and have a repeated calling of
>
> > > Ben
>
> > OOPS. I pressed the Send too fast.
>
> > The problem w/ Boris's solution is that after repeated calling of
> > randomPick(3,items), 'Jane' is not the most "frequent appearing"
> > member in all the out list of 3 member lists...- Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -