Weighted "random" selection from list of lists

Peter Otten __peter__ at web.de
Sat Oct 8 15:04:32 EDT 2005


Jesse Noller wrote:

> I'm probably missing something here, but I have a problem where I am
> populating a list of lists like this:
> 
> list1 = [ 'a', 'b', 'c' ]
> list2 = [ 'dog', 'cat', 'panda' ]
> list3 = [ 'blue', 'red', 'green' ]
> 
> main_list = [ list1, list2, list3 ]
> 
> Once main_list is populated, I want to build a sequence from items
> within the lists, "randomly" with a defined percentage of the sequence
> coming for the various lists. For example, if I want a 6 item
> sequence, I might want:
> 
> 60% from list 1 (main_list[0])
> 30% from list 2 (main_list[1])
> 10% from list 3 (main_list[2])
> 
> I know how to pull a random sequence (using random()) from the lists,
> but I'm not sure how to pick it with the desired percentages.


If the percentages can be normalized to small integral numbers, just make a
pool where each list is repeated according to its weight, e. g.
list1 occurs 6, list2 3 times, and list3 once:

pools = [list1, list2, list3]
weights = [6, 3, 1]
sample_size = 10

weighted_pools = []
for p, w in zip(pools, weights):
    weighted_pools.extend([p]*w)

sample = [random.choice(random.choice(weighted_pools))
    for _ in xrange(sample_size)]


Another option is to use bisect() to choose a pool:

pools = [list1, list2, list3]
sample_size = 10

def isum(items, sigma=0.0):
    for item in items:
        sigma += item
        yield sigma

cumulated_weights = list(isum([60, 30, 10], 0))
sigma = cumulated_weights[-1]

sample = []
for _ in xrange(sample_size):
    pool = pools[bisect.bisect(cumulated_weights, random.random()*sigma)]
    sample.append(random.choice(pool))

(all code untested)

Peter



More information about the Python-list mailing list