Probability selection algorithm

Chad Netzer cnetzer at mail.arc.nasa.gov
Fri Feb 7 18:42:09 EST 2003


On Fri, 2003-02-07 at 00:03, Noah wrote:

> I wrote the following demo based on your code.
> This seems to do everything I need. Thanks!
> Let me know if you see any mistakes.

Noah, random_select() does indeed appear to draw an index based on a
probability list.  However, make_random_probabilities_table() creates a
bizzare probability distribution that is not uniform.  Also, because it
uses sort, it is slower than just creating a true cumulative probability
distribution list from a probability density list (which random_select()
should still operate correctly on).

In your code, by calling random.uniform() in the loop with a different
'total' each time, you a severely skewing the results, so that they are
no longer from a uniform distribution, even before sorting.  Also, the
resulting probability table is not guaranteed to summ to exactly 1.0.

Finally, although random_select seems to work, if problist was a true
cumulative density function list, the algorithm could be made shorter
(no need to keep track of 'remaining') and clearer.  For example, it is
not even clear to me by quick perusal that random_select() is guaranteed
to terminate by returning a tuple; what if it goes through the whole
list without the if statement being true.  This may be possible if the
problist doesn't add up exactly to 1.0 (lets say it is 0.9999), and
random() returns a number than is extremely close to 1.0 (say
0.99999999).  I just tried it, actually, and that makes it fail.

So, my question to you is this:

Do you mind using Python Numeric?  It wil be a LOT faster than operating
on python lists, and will also be much shorter to write.  If not, do you
mind using the Python operator module and functional programming
techniques (mainly reduce and map)?

Also, what kind of probability table do you want to use (uniform,
gaussian, beta, gamma, etc.)?

-- 
Bay Area Python Interest Group - http://www.baypiggies.net/

Chad Netzer
(any opinion expressed is my own and not NASA's or my employer's)







More information about the Python-list mailing list