implement random selection in Python

J. Clifford Dyer jcd at sdf.lonestar.org
Sat Nov 17 10:23:58 EST 2007


On Fri, 2007-11-16 at 16:47 -0800, Bruza wrote:
I think I need to explain on the probability part: the "prob" is a
> relative likelihood that the object will be included in the output
> list. So, in my example input of
> 
>   items = [('Mary',30), ('John', 10), ('Tom', 45), ('Jane', 15)]
> 
> So, for any size of N, 'Tom' (with prob of 45) will be more likely to
> be included in the output list of N distinct member than 'Mary' (prob
> of 30) and much more likely than that of 'John' (with prob of 10).
> 
> I know "prob" is not exactly the "probability" in the context of
> returning a multiple member list. But what I want is a way to "favor"
> some member in a selection process.
> 
My understanding of what you are looking for is that on each individual selection, the probability of picking a given name is that name's prob value, divided by the sum of all prob values.  That is, in the following case:

items = [('Mary',96),('Shimon',1),('Aisha',1),('Toshiro',2)]
N = 3

On the first pass Mary has a 96% chance of getting selected, Shimon a 1% Chance, Aisha a 1% chance and Toshiro a 2% chance.

If Mary gets selected, then on the second round Shimon and Aisha should each have a 25% chance of getting selected, and Toshiro a 50% chance.

If Shimon gets selected, then on round three, Aisha will have a ~33% chance, and Toshiro a little less than 67%.

If that is correct, then the following might meet your needs:

from random import choice

def select(weighted_list, n=1):
    selected = set()
    for iteration in xrange(n):
        print iteration
        selection_list = []
        for item in weighted_list:
            if item[0] not in selected:
                selection_list.extend([item[0]] * item[1])
        #print selection_list
        this_choice = choice(selection_list)
        #print this_choice
        selected.add(this_choice)
        return selected

if __name__ == '__main__':
    items = [('Mary',96),('Shimon',1),('Aisha',1),('Toshiro',2)]
    N = 3
    print select(items, 3)
    

It creates a new selection list at every pass, but it should be a marked improvement over solutions that have to flail against already chosen values, especially in unbalanced cases like the one I've shown above.  I have not run tests to establish this for certain.  Exercise left to the reader.  

If you care who was chosen first or last (like, say, if this is for selecting members of a kick ball team), you may want to replace the selected set with a list.  Or you may want to keep both the set and a list.  Depends on your needs.

Cheers,
Cliff




More information about the Python-list mailing list