[Numpy-discussion] random number generator, entropy and pickling

Gael Varoquaux gael.varoquaux at normalesup.org
Mon Apr 25 12:57:00 EDT 2011


Hi there,

We are courrently having a discussion on the scikits learn mailing list
about which patterns to adopt for random number generation. One thing
that is absolutely clear is that making the stream of random numbers
reproducible is critical. We have several objects that can serve as random
variate generators. So far, we instanciate these objects with a optional
seed or PRNG argument, as in:

    def __init__(self, prng=None):
	if prng is None:
	    prng = np.random
	self.prng = prng

The problem with this pattern is that np.random doesn't pickle, and
therefore the objects do not pickle by default. A bit of pickling magic
would solve this, but we'd rather avoid it.

We thought that we could simply have a PRNG per object, as in:

    def __init__(self, prng=None):
	if prng is None:
	    prng = np.random.RandomState()
	self.prng = prng

I don't like this option, because it means that with a given pieve of
code, setting the seed of numpy's PRNG isn't enough to make it
reproducible.

I couldn't retrieve a handle on a picklable instance for the global PRNG.

The only option I can see would be to use the global numpy PRNG to seed
an instance specific RandomState, as in:

    def __init__(self, prng=None):
	if prng is None:
	    prng = np.random.RandomState(np.random.random())
	self.prng = prng

That way seeding the global PRNG really does control the full random
number generation. I am wondering if it would have an adverse consequence
on the entropy of the stream of random numbers. Does anybody have
suggestions? Advices?

Cheers,

Gael



More information about the NumPy-Discussion mailing list