Weighted "random" selection from list of lists
Peter Otten
__peter__ at web.de
Sat Oct 8 15:04:32 EDT 2005
Jesse Noller wrote:
> I'm probably missing something here, but I have a problem where I am
> populating a list of lists like this:
>
> list1 = [ 'a', 'b', 'c' ]
> list2 = [ 'dog', 'cat', 'panda' ]
> list3 = [ 'blue', 'red', 'green' ]
>
> main_list = [ list1, list2, list3 ]
>
> Once main_list is populated, I want to build a sequence from items
> within the lists, "randomly" with a defined percentage of the sequence
> coming for the various lists. For example, if I want a 6 item
> sequence, I might want:
>
> 60% from list 1 (main_list[0])
> 30% from list 2 (main_list[1])
> 10% from list 3 (main_list[2])
>
> I know how to pull a random sequence (using random()) from the lists,
> but I'm not sure how to pick it with the desired percentages.
If the percentages can be normalized to small integral numbers, just make a
pool where each list is repeated according to its weight, e. g.
list1 occurs 6, list2 3 times, and list3 once:
pools = [list1, list2, list3]
weights = [6, 3, 1]
sample_size = 10
weighted_pools = []
for p, w in zip(pools, weights):
weighted_pools.extend([p]*w)
sample = [random.choice(random.choice(weighted_pools))
for _ in xrange(sample_size)]
Another option is to use bisect() to choose a pool:
pools = [list1, list2, list3]
sample_size = 10
def isum(items, sigma=0.0):
for item in items:
sigma += item
yield sigma
cumulated_weights = list(isum([60, 30, 10], 0))
sigma = cumulated_weights[-1]
sample = []
for _ in xrange(sample_size):
pool = pools[bisect.bisect(cumulated_weights, random.random()*sigma)]
sample.append(random.choice(pool))
(all code untested)
Peter
More information about the Python-list
mailing list