Random Drawing Simulation -- performance issue

Tue Sep 12 22:43:53 EDT 2006

Brendon Towle wrote:
> I need to simulate scenarios like the following: "You have a deck of  
> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,  
> replace it, and repeat N times."
> 

Thinking about the problem as drawing sample froms a discrete 
distribution defined by the population might help.

For example, in SciPy you can define your own discrete random variable:

var = scipy.stats.rv_discrete(name='sample', 
values=([0,1,2],[3/10.,5/10.,2/10.]))

Then

var.rvs(size=10000)

would quickly return an array of "draws"

If you didn't want to install SciPy, but were willing to install NumPy, 
then the crux of the algorithm is to generate an entire array of uniform 
random variables:  numpy.random.rand(count) and then map them through 
the inverse cumulative distribution function to generate your samples. 
  The inverse cumulative distribution function is just a series of steps 
whose width depends on the probablity distribution.

Thus, the population defines the draw boundaries.  To make it easy, just 
multiply the uniform random number by the total number of cards and then 
the boundaries are on the integers of the number of each kind of card.

Here is an implementation.  I find this version to be 2x - 5x faster 
depending on how many draws are used.

import numpy as N

def randomDrawing_N(count, population):
     probs = [x[0] for x in population]
     res = [[0, item[1]] for item in population]
     nums = N.random.rand(count)
     cprobs = N.cumsum([0]+probs)
     nums *= cprobs[-1]
     for k, val in enumerate(probs):
         res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
     return res

-Travis Oliphant