Monte Carlo Method and pi

Mon Jul 12 16:19:52 EDT 2004

beliavsky at aol.com wrote:
> Dan Christensen <jdc at uwo.ca> wrote in message news:<87llhqbbh5.fsf at uwo.ca>...
> 
>>I got to thinking about this problem.  The most direct implementation is
>>something like:
>>
>>from random import random
>>from math import sqrt
>>
>>def v1(n = 500000):
>>    rand = random
>>    sqr  = sqrt
>>    sm = 0
>>    for i in xrange(n):
>>        sm += sqr(1-rand()**2)
>>    return 4*sm/n
>>
>>This takes about 0.68s on a 2.66GHz P4 (using Python 2.4a) and
>>about 0.29s with psyco (using Python 2.3.4).

I can more than double the speed of this under psyco by replacing **2 
with x*x. I have inside information that pow is extra slow on psyco(*) 
plus, as I understand it, x*x is preferred over **2 anyway.

from math import sqrty
from random import random

def v3(n = 500000):
     sm = 0
     for i in range(n):
         x = random()
         sm += sqrt(1-x*x)
     return 4*sm/n
psyco.bind(v3)

(*) Psyco's support for floating point operations is considerable slower 
than its support for integer operations. The latter are optimized all 
the way to assembly when possible, but the latter are, at best, 
optimized into C function calls that perform the operation. Thus there 
is always some function call overhead. Fixing this would require someone 
who groks both the guts of Psyco and the intracies of x86 floating point 
machine code to step forward. In the case of pow, I believe that there 
is always callback into the python interpreter, so it's extra slow.

>>
>>One of the great things about python is that it's possible (and fun)
>>to move loops into the C level, e.g. with this implementation:
>>
>>def v2(n = 500000):
>>    a = numarray.random_array.uniform(0, 1, n)
>>    return 4 * numarray.sum(numarray.sqrt(1 - a**2)) / n
>>
>>which clocks in at 0.16s with Python 2.4a.  (psyco doesn't help.)

The same trick (replacing a**2 with a*a) also double the speed of this 
version.

def v4(n = 500000):
      a = numarray.random_array.uniform(0, 1, n)
      return 4 * numarray.sum(numarray.sqrt(1 - a*a)) / n

> 
> For comparison, on a 2.8GHz P4 the Fortran code to compute pi using
> simulation runs in about 0.09s (measured using timethis.exe on Windows
> XP), REGARDLESS of whether the calcuation is done with a loop or an
> array. If 5e6 instead of 5e5 random numbers are used, the time taken
> is about 0.4s, and the scaling with #iterations is about linear beyond
> that. Here are the codes, which were compiled with the -optimize:4
> option using Compaq Visual Fortran 6.6c.

For yet more comparison, here are times on a 2.4 GHz P4, timed with 
timeit (10 loops):

v1 0.36123797857
v2 0.289118589094
v3 0.158556464579
v4 0.148470238536

If one takes into the accout the speed difference of the two CPUs this 
puts the both the numarray and psyco solutions within about 50% of the 
Fortran solution, which I'm impressed by. Of course, the Fortran code 
uses x**2 instead of x*x, so it too, might be sped by some tweaks.

-tim