Smarter way of doing this?

Mon Feb 2 17:50:41 EST 2004

* Max M <maxm at mxm.dk> [2004-02-02 05:15]:
> I have written this function, "choose_by_probability()"
> 
> It takes a list of probabilities in the range 0.00-1.00, and it must 
> then randomly select one of the probabilities and return it's index.
> 
> The idea is to have two lists, one with a value, and another with a 
> probability. The value then gets randomly selected from its probability.
> 
> values = ['item 1','item 2','item 3',]
> probabilities = [0.66, 0.33, 0.16]
> 
> print values[choose_by_probability(probabilities)]
> >>'item 1'
> 
> etc...
> 
> I just wondered if anybody has a better way of doing it, as this seems 
> nastily squared to me?

Well, since you're worried about the "nastily squared" aspect, and since
most (of my) applications for this only need to do the setup once, and
choose randomly multiple times, my solution uses a class to do this.
This takes into account someone else's suggestion of associating the
data and the probabilities directly, rather than worrying about indexes.

Here's my go at it:

----------chooser.py-----------
import random

class Item:
    def __init__(self,value,weight):
        self.value = value
        self.weight = weight
        self.probability = 0.0

class ProbabilityChooser:
    def __init__(self,items=None):
        """optional 'items' parameter is a sequence of 2-tuples.
           The first sub-item is the value, the second is the weight."""
        self.total_weight = 0.0
        self.items = []
        if items is not None:
            for (value,weight) in items:
                self.add_item(value,weight,recompute=False)
            self._compute_cumulative_probabilities()

    def _compute_cumulative_probabilities(self):
        """ use weights to pre-compute probabilities for all items """
        cumul = 0.0
        for item in self.items:
            cumul += item.weight / self.total_weight
            item.probability = cumul

    def add_item(self,value,weight=1,recompute=True):
        self.items.append(Item(value,weight))
        self.total_weight += weight
        if recompute:
            self._compute_cumulative_probabilities()

    def probabilities(self):
        return [(i.value,i.probability) for i in self.items]

    def choose(self):
        rand = random.random()
        for item in self.items:
            if rand <= item.probability:
                return item.value
----------/chooser.py-----------

And here's how I'd use it:

>>> import chooser
>>> values = ['item 1','item 2','item 3',]
>>> probabilities = [0.66, 0.33, 0.16]
>>> c = chooser.ProbabilityChooser(zip(values,probabilities))
>>> c.probabilities()
[('item 1', 0.57391304347826089), ('item 2', 0.86086956521739133),
('item 3', 1.0)]
>>> c.choose()
'item 1'
>>> c.choose()
'item 1'
>>> c.choose()
'item 2'
>>> c.choose()
'item 2'
>>> c.choose()
'item 1'
>>> c.choose()
'item 1'
>>> c.choose()
'item 3'
>>> c.choose()
'item 1'
>>> c.add_item('item 4',1.0)
>>> c.choose()
'item 4'
>>> c.choose()
'item 1'
>>> [c.choose() for i in range(30)]
['item 4', 'item 2', 'item 4', 'item 4', 'item 2', 'item 1', 'item 1',
'item 3', 'item 1', 'item 1', 'item 3', 'item 1', 'item 1', 'item 4',
'item 1', 'item 4', 'item 4', 'item 2', 'item 1', 'item 4', 'item 1',
'item 1', 'item 4', 'item 4', 'item 4', 'item 2', 'item 4', 'item 4',
'item 4', 'item 1']
>>> 

HTH-

John
-- 
John Hazen
<my first name>@<my last name>.net