Monte Carlo probability calculation in Python

Thu Feb 5 15:56:14 EST 2015

On Thu, 05 Feb 2015 11:25:42 -0800, Paul  Moore wrote:

> On Thursday, 5 February 2015 16:57:07 UTC, Rob Gaddi  wrote:
>> You don't need the whole scipy stack, numpy will let you do everything
>> you want.  The trick to working in numpy is to parallelize your
>> problem;
>> you don't do a thing a thousand times; you do it on a thousand-length
>> array.  For example:
>> 
>> def dice(throws, per, sides=6):
>>     """Return an array throws long of rolls of (per)d(sides)."""
>>     all_dice = np.random.randint(1, sides+1, size=(throws, per))
>>     return all_dice.sum(axis=1)
> 
> Thanks, that's a help. I see the principle, but a couple of questions.
> With bigger problems (deal 52 cards into bridge hands a million times,
> for example) would memory become an issue? Also, how do you handle
> things that don't fit into the built-in numpy operations? (For example,
> Monopoly - roll 2 dice and take the sum, unless you roll a double, in
> which case reroll, but if you roll 3 doubles you fail - return NaN in
> that case).
> 
> As examples of a few more problems I'd use this for:
> 
> 1. Roll 4 dice, and sum the largest 3.
> 2. Roll 9 dice, count how many different numbers you get.
> 3. Deal 4 hands of 13 cards from a full deck, determine the length of
> the longest suit anyone has.
> 4. Monopoly rolls (see above, 2 dice but reroll doubles up to 3 times)
> 
> All of these are reasonably easy to write as Python functions, but it
> becomes progressively harder (for a non-numpy expert) to know how to
> convert them into parallelizable numpy primitives.
> 
> I'll have a play with the approach you suggest though, and see how it
> handles some of the more complex problems my friend was showing off with
> :-) (his worst example was a horrendous expression to score Cribbage
> hands, that if I read his note correctly took about a minute to simulate
> 10,000,000 deals)
> 
> Paul

A lot of your examples can be dealt by the numpy fancy indexing tricks, 
which is really the heart of numpy.  A comparison on an array of numbers 
gives an array of bools, which can then be used to index the original 
array.  The Monopoly example, for instance.

  all_dice = np.random.randint(1, 7, size=(throws, 2))
  doubles = all_dice[all_dice[:,0] == all_dice[:,1]]

doubles is now a writeable view into the original all_dice array, so you 
could reroll them by saying something like.

  doubles[:,:] = np.random.randint(1, 7, size=doubles.shape)

Memory will eventually become a concern, but in that case you could 
always run it in parts and aggregate the parts.  I wouldn't sweat it 
until you get into 100s of millions of tries.  Even then, you could make 
some improvements by forcing the dtypes to be uint8, though there may be 
a performance penalty.

Numpy is fantastically powerful once you can wrap your head around it; I 
use it for a ton of things I used to use Matlab for.  The tutorial at  
http://wiki.scipy.org/Tentative_NumPy_Tutorial will go into lots of depth 
for you.

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.