[SciPy-user] Making faster statistical distributions

Thu Jan 29 15:53:10 EST 2004

On Jan 29, 2004, at 2:17 PM, Travis Oliphant wrote:

> Christopher Fonnesbeck wrote:
>
>> I am already using pieces of SciPy in my Markov chain Monte Carlo 
>> package (PyMC), mostly for plotting functionality. I would also like 
>> to exploit the distributions implemented in scipy.stats, but they are 
>> far too slow for use in statistical simulation applications like 
>> MCMC, where millions of random draws may be taken. Therefore, I am 
>> thinking of implementing many of these distributions (at least the 
>> common ones) as C or Fortran extensions. I am unsure whether to use 
>> Fortran through f2py for this task, or C through weave.inline (for 
>> example). I have used both in the past for various tasks, and was 
>> generally happy with both. Any suggestions?
>
>
> Could you specify which ones are too slow?  This is a rather broad 
> statement as many are implemented in C and are very fast.   Some 
> distributions, however,  do default to using a numerical solver to 
> invert the cdf and apply this to uniform random variates.  You can 
> improve the speed of these distributions by overriding the _ppf  
> method or the _rvs method of the object to use a faster, more 
> specialized method.   I would use weave or fortran with f2py to do 
> this.
>
> Best,
>
> -Travis O.

Well, the binomial and normal distributions, for sure, off the top of 
my head. Using the scipy distributions slows my MCMC code down 
significantly (they were the bottleneck, according to the profiling 
module). Using Fortran via f2py sped things up a lot. I'm not talking 
about the generation of random deviates, necessarily, but rather the 
pdf's, which are used for calculating likelihoods.

C.

--
Christopher J. Fonnesbeck ( c h r i s @ f o n n e s b e c k . o r g )
Georgia Cooperative Fish & Wildlife Research Unit, University of Georgia