[SciPy-user] random number sampling performance in newcore

Tue Nov 8 19:11:01 EST 2005

Chris Fonnesbeck wrote:
> I was surprised to find that the random number sampler in newcore is
> significantly slower than RandomArray in Numeric (at least for
> binomial sampling). A quick comparison showed the
> scipy.basic.random.binomial sampler to be over 8 times slower than
> RandomArray.binomial. I was surprised, since the newcore stuff is
> Pyrex-based (isnt it?).

It's a Pyrex wrapper to pure C code. See distributions.c for the actual
implementations of the distribution algorithms.

> Am I the only one observing such differences,
> or am I using the wrong method? If not, is the performance expected to
> improve significantly.

Can you show us the code you're using? Particularly, the parameters p
and n. I use two different algorithms for the binomial distribution, a
waiting time algorithm when p*n <= 30 (or (1-p)*n <= 30 if p > 0.5) and
the BTPE algorithm (the same algorithm that's in RANLIB, but
reimplemented) when p*n > 30 (or (1-p)*n ...).

In [10]: tra = Timer('x=RA.binomial(n,p,size)', 'import RandomArray as
RA; n=200; p=0.25; size=10000')

In [11]: tra.repeat(3,100)
Out[11]: [1.5208730697631836, 1.5047860145568848, 1.5198800563812256]

In [12]: tra = Timer('x=RA.binomial(n,p,size)', 'import RandomArray as
RA; n=20; p=0.25; size=10000')

In [13]: tra.repeat(3,100)
Out[13]: [0.89270305633544922, 0.90227413177490234, 1.183967113494873]

In [14]: tmt = Timer('x=random.binomial(n,p,size)', 'from scipy import
random; n=200; p=0.25; size=10000')

In [15]: tmt.repeat(3,100)
Out[15]: [1.922713041305542, 2.0582780838012695, 1.906635046005249]

In [16]: tmt = Timer('x=random.binomial(n,p,size)', 'from scipy import
random; n=20; p=0.25; size=10000')

In [17]: tmt.repeat(3,100)
Out[17]: [3.1456100940704346, 2.9999620914459229, 2.9954609870910645]

It looks like my implementation of BTPE is 30% slower than RANLIB and
the waiting time algorithm is 3 times slower than RANLIB's BTPE on my
machine.

Compare to normal():

In [21]: tra = Timer('x=RA.normal(0.0, 1.0, size)', 'import RandomArray
as RA; size=10000')
In [22]: tra.repeat(3,100)
Out[22]: [0.70471906661987305, 0.76926708221435547, 0.7034919261932373]

In [23]: tmt = Timer('x=random.normal(size=size)', 'from scipy import
random; size=10000')
In [24]: tmt.repeat(3,100)
Out[24]: [0.56431293487548828, 0.52272319793701172, 0.79108381271362305]

So the evidence suggests that the slowness is constrained to individual
distributions, one of them being binomial. I'll futz with it.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
 Are the graves of dreams allowed to die."
  -- Richard Harter