[Numpy-discussion] Random number generators

Mon Sep 4 04:31:02 EDT 2006

Charles R Harris wrote:

> What sort of api should this be? It occurs to me that there are already 
> 4 sources of random bytes:
> 
> Initialization:
> 
> /dev/random (pseudo random, I think)
> /dev/urandom
> crypto system on windows
> 
> Pseudo random generators:
> 
> mtrand
> 
> I suppose we could add some cryptologically secure source as well. That 
> indicates to me that one set of random number generators would just be 
> streams of random bytes, possibly in 4 byte chunks. If I were doing this 
> for linux these would all look like file systems, FUSE comes to mind. 
> Another set of functions would transform these into the different 
> distributions. So, how much should stay in numpy? What sort of API are 
> folks asking for?

It would be good to also support quasi-random generators like Sobol and Halton 
sequences. They tend to only generate floating point numbers in [0, 1] (or (0, 
1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs that 
only output floating point numbers, but I doubt we want to implement any of 
them. You can look at the 1.6 version of RandomKit for an implementation of 
Sobol sequences and ISAAC, a cryptographic PRNG.

   http://www.jeannot.org/~js/code/index.en.html

I'm thinking that the core struct that we're going to pass around will look 
something like this:

struct random_state {
   void* state;
   unsigned int (*gen_uint32)(struct random_state*);
   double (*gen_double)(struct random_state*);
}

state -- A pointer to the generator-specific struct.

gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. Possibly 
NULL if the generator can not implement such a generator.

gen_double -- A function pointer that yields a double-precision number in [0, 1] 
(possibly omitting one or both of the endpoints depending on the implementation).

Possibly this struct needs to be expanded to include function pointers for 
destroying the state struct and for a copying the state to a new object. The 
seeding function(s) probably don't need to travel in the struct. We should 
determine if C++ will work for us, here. Unfortunately, that will force all 
extension modules that want to use this API to be C++ modules.

One other feature of the current implementation of state struct is that a few of 
the distributions cache information between calls. Notably, the Box-Müller 
Gaussian distribution algorithm generates two normal deviates at a time; one is 
returned, the other is cached until the next call. The binomial distribution 
computes some intermediate values that are reused if the next call has the same 
parameters. We should probably store such things separately from the core state 
struct this time.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco