Random Drawing Simulation -- performance issue

Wed Sep 13 09:44:40 EDT 2006

On 13 Sep 2006, at 1:01 AM, python-list-request at python.org wrote:

> Date: 12 Sep 2006 20:17:47 -0700
> From: Paul Rubin <http://phr.cx@NOSPAM.invalid>
> Subject: Re: Random Drawing Simulation -- performance issue
> To: python-list at python.org
>
> "Travis E. Oliphant" <oliphant.travis at ieee.org> writes:
>>> I need to simulate scenarios like the following: "You have a deck of
>>> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
>>> replace it, and repeat N times."
>>>
>> Thinking about the problem as drawing sample froms a discrete
>> distribution defined by the population might help.
>
> Is there some important reason you want to do this as a simulation?
> And is the real problem more complicated?  If you draw from the
> distribution 100,000 times with replacement and sum the results, per
> the Central Limit Theorem you'll get something very close to a normal
> distribution whose parameters you can determine analytically.  There
> is probably also some statistics formula to find the precise error.
> So you can replace the 100,000 draws with a single draw.

The real problem is not substantially more complicated. (The real  
code is, because it's embedded in a bunch of other stuff, but that's  
not the point.)

I guess the essential reason that I want to do it as a simulation,  
and not as a statistics formula, is that I'd like the code to be  
readable (and modifiable) by a programmer who doesn't have a  
statistics background. I could dredge up enough of my college stats  
to do as you suggest (although I might not enjoy it), but I don't  
think I want to make that a requirement.

On the other hand (quote somewhat snipped):

> Date: Tue, 12 Sep 2006 22:46:04 -0500
> From: Robert Kern <robert.kern at gmail.com>
> Subject: Re: Random Drawing Simulation -- performance issue
> To: python-list at python.org
>
> Along the lines of what you're trying to get at, the problem that  
> the OP is
> describing is one of sampling from a multinomial distribution.
>
> numpy has a function that will do the sampling for you:
>
> In [4]: numpy.random.multinomial?
> Docstring:
>      Multinomial distribution.
>
>      multinomial(n, pvals, size=None) -> random values
>
>      pvals is a sequence of probabilities that should sum to 1  
> (however, the
>      last element is always assumed to account for the remaining  
> probability
>      as long as sum(pvals[:-1]) <= 1).

Here, I'm torn. I do want the code to be accessible to non-stats  
people, but this just might do the trick. Must ponder.

Thanks, everyone, for your helpful suggestions!

B.

-- 
Brendon Towle, PhD
Cognitive Scientist
+1-412-690-2442x127
Carnegie Learning, Inc.
The Cognitive Tutor Company ®
Helping over 375,000 students in 1000 school districts succeed in math.