[Numpy-discussion] Random number generators

A. M. Archibald peridot.faceted at gmail.com
Mon Sep 4 10:34:36 EDT 2006


On 04/09/06, Robert Kern <robert.kern at gmail.com> wrote:
> Charles R Harris wrote:
>
> > What sort of api should this be? It occurs to me that there are already
> > 4 sources of random bytes:
> >
> > Initialization:
> >
> > /dev/random (pseudo random, I think)

/dev/random is (supposed to be) true randomness derived from various
sources. It's also fed through a bunch of hashing and CRC functions
chosen for convenience, but is probably pretty unbiased.

> > /dev/urandom
> > crypto system on windows
> >
> > Pseudo random generators:
> >
> > mtrand
> >
> > I suppose we could add some cryptologically secure source as well. That
> > indicates to me that one set of random number generators would just be
> > streams of random bytes, possibly in 4 byte chunks. If I were doing this
> > for linux these would all look like file systems, FUSE comes to mind.
> > Another set of functions would transform these into the different
> > distributions. So, how much should stay in numpy? What sort of API are
> > folks asking for?

"Cryptographically secure random number generator" means something
different to everybody, and implementing one that everyone is happy
with is going to be a colossal pain. Look at Yarrow and its design
documents for some idea of the complexities involved.

However, a random number generator based on a stream cipher is
probably going to be pretty good, if slow, so implementingone if those
can't hurt. But I think leaving the possibility open that one could
use custom sources of random bytes is a very good idea. There's no
particular need that custom ones should be fast, as they're for use
when you really want *good* random numbers.

> It would be good to also support quasi-random generators like Sobol and Halton
> sequences. They tend to only generate floating point numbers in [0, 1] (or (0,
> 1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs that
> only output floating point numbers, but I doubt we want to implement any of
> them. You can look at the 1.6 version of RandomKit for an implementation of
> Sobol sequences and ISAAC, a cryptographic PRNG.
>
>    http://www.jeannot.org/~js/code/index.en.html

Sobol' sequences may require special care in constructing non-uniform
distributions in order to preserve their uniformity properties. If we
can simply import ISAAC, it won't hurt, but I'd rather trust (say) AES
in counter mode than some custom cryptosystem that hasn't been
analyzed as thoroughly.

> I'm thinking that the core struct that we're going to pass around will look
> something like this:
>
> struct random_state {
>    void* state;
>    unsigned int (*gen_uint32)(struct random_state*);
>    double (*gen_double)(struct random_state*);
> }
>
> state -- A pointer to the generator-specific struct.
>
> gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. Possibly
> NULL if the generator can not implement such a generator.
>
> gen_double -- A function pointer that yields a double-precision number in [0, 1]
> (possibly omitting one or both of the endpoints depending on the implementation).

Are 32-bit numbers really the right least common denominator? There
are plenty of 64-bit platforms out there...

Given this API, implementing a subclassable class that exports it
should satisfy most people's needs for interchangeable generators.

A. M. Archibald




More information about the NumPy-Discussion mailing list