[SciPy-user] Making faster statistical distributions

John Hunter jdhunter at ace.bsd.uchicago.edu
Thu Jan 29 10:54:44 EST 2004


>>>>> "Christopher" == Christopher Fonnesbeck <chris at fonnesbeck.org> writes:

    Christopher> I am already using pieces of SciPy in my Markov chain
    Christopher> Monte Carlo package (PyMC), mostly for plotting
    Christopher> functionality. I would also like to exploit the
    Christopher> distributions implemented in scipy.stats, but they
    Christopher> are far too slow for use in statistical simulation
    Christopher> applications like MCMC, where millions of random
    Christopher> draws may be taken. Therefore, I am thinking of
    Christopher> implementing many of these distributions (at least
    Christopher> the common ones) as C or Fortran extensions. I am
    Christopher> unsure whether to use Fortran through f2py for this
    Christopher> task, or C through weave.inline (for example). I have
    Christopher> used both in the past for various tasks, and was
    Christopher> generally happy with both. Any suggestions?

I've wrote some code to fill Numeric arrays with levy distributions
from gsl.  GSL has many, many distributions -
http://www.gnu.org/software/gsl/manual/gsl-ref_19.html#SEC284

Here's a sample test script

    from levy import randlevy, seed
    gamma = 0.003501
    alpha = 1.341702

    seed(1235)
    x =  randlevy(100000, alpha, gamma)
    print min(x), max(x)

Seeding is optional - the rng is seeded by the clock if you don't
explicitly call seed.  Below is the module code which you can use as a
template if you're interested.  I think it would be nice to extend
this code to include all the gsl distributions, but haven't taken the
time.  pygsl wraps these distributions, but suffers from the repeated
call overhead that is hurting you.  It would be a natural extension of
pygsl to include initializing numeric arrays from the distributions.

#include "Python.h"
#include "Numeric/arrayobject.h"
#include <stdio.h>
#include <time.h>
#include <gsl/gsl_rng.h>
#include <gsl/gsl_randist.h>
#include <gsl/gsl_errno.h>



static PyObject *ErrorObject;

// get x[i] from a 1d array of type xtype
#define get1d(x,i,xtype) \
*(xtype *)(x->data+i*x->strides[0])

int _levymodRNGSeeded = 0;
gsl_rng * _levymodRNG;

static PyObject *
seed(PyObject *self, PyObject *args)
{
  const gsl_rng_type * T;
  int N;
  T = gsl_rng_default;
  _levymodRNG = gsl_rng_alloc(T);
  gsl_rng_env_setup();

  if (args==Py_None) {
    //printf("Default seed\n");
    time_t t1;
    (void) time(&t1);
    srand48((long) t1);
    N = (int)t1;
  }
  else if (!PyArg_ParseTuple(args, "i", &N)) 
    return NULL;

  //printf("Setting seed %d\n", N);
  gsl_rng_set (_levymodRNG, N);
  _levymodRNGSeeded = 1;

  Py_INCREF(Py_None);
  return Py_None;

}



static PyObject *
randlevy(PyObject *self, PyObject *args)
{

  if (_levymodRNGSeeded == 0)  {
    seed(self, Py_None);    
  }
    
  int dimensions[1];
  int i;
  PyArrayObject *x;
  int N;
  float alpha;
  float gamma;

  if (!PyArg_ParseTuple(args, "iff", &N, &alpha, &gamma)) 
    return NULL;
    
  dimensions[0] = N;
  
  x = (PyArrayObject *)PyArray_FromDims(1,dimensions,PyArray_FLOAT);
  
  for (i=0; i<N; ++i) {
    get1d(x,i,float) = gsl_ran_levy(_levymodRNG, gamma, alpha);
  }
    
  return Py_BuildValue("N", x);
  
}



static struct PyMethodDef _levy_methods[] = {
  {"randlevy", (PyCFunction)randlevy, METH_VARARGS, 
   "Return a numeric array of levy numbers."},
  {"seed", (PyCFunction)seed, METH_VARARGS, 
   "Seed the random number generator."},
  {NULL,		NULL}		/* sentinel */
};



DL_EXPORT(void) init_levy(void)
{
  PyObject *m, *d;
  
  /* Create the module and add the functions */
  m = Py_InitModule4("_levy", _levy_methods,
		     "Fill Numeric arrays with random numbers ",
		     (PyObject*)NULL,PYTHON_API_VERSION);
  
  /* Import the array object */
  import_array();
  
  /* Add some symbolic constants to the module */
  d = PyModule_GetDict(m);
  ErrorObject = PyString_FromString("_levy.error");
  PyDict_SetItemString(d, "error", ErrorObject);
  
  /* Check for errors */
  if (PyErr_Occurred())
    Py_FatalError("can't initialize module _levy");
}



More information about the SciPy-User mailing list