[Python-ideas] Should our default random number generator be secure?

Tim Peters tim.peters at gmail.com
Thu Sep 10 05:35:55 CEST 2015


[Nathaniel Smith <njs at vorpus.org>]
> Yeah, the independent-seed-for-each-thread approach works for any RNG, but
> just like people feel better if they have a 100% certified guarantee that
> the RNG output in a single thread will pass through every combination of
> possible values (if you wait some cosmological time), they also feel better
> if there is some 100% certified guarantee that the RNG values in two threads
> will also be uncorrelated with each other.
>
> With something like MT, if two threads did end up with nearby seeds, then
> that would be bad: each thread individually would see values that looked
> like high quality randomness, but if you compared across the two threads,
> they would be identical modulo some lag. So all the nice theoretical
> analysis of the single threaded stream falls apart.
>
> However, for two independently seeded threads to end up anywhere near each
> other in the MT state space requires that you have picked two numbers
> between 0 and 2**19937 and gotten values that were "close". Assuming your
> seeding procedure is functional at all, then this is not a thing that will
> ever actually happen in this universe.

I think it's worse than that.  MT is based on a linear recurrence.
Take two streams "far apart" in MT space, and their sum also satisfies
the recurrence.  So a possible worry about a third stream isn't _just_
about correlation or overlap with the first two streams, but,
depending on the app, also about correlation/overlap with the sum of
the first two streams.  Move to N streams, and there are O(N**2)
direct sums to worry about, and then sums of sums, and ...

Still won't cause a problem in _my_ statistical life expectancy, but I
only have 4 cores ;-)


> So AFAICT the rise of explicitly multi-threaded RNG designs is one of
> those fake problems that exists only so people can write papers about
> solving it. (Maybe this is uncharitable.)

Uncharitable, but fair :-)


> So there exist RNG designs that handle multi-threading explicitly, and it
> shows up on feature comparison checklists. I don't think it should really
> affect Python's decisions at all though.

There are some clean and easy approaches to this based on
crypto-inspired schemes, but giving up crypto strength for speed.  If
you haven't read it, this paper is delightful:

    http://www.thesalmons.org/john/random123/papers/random123sc11.pdf


More information about the Python-ideas mailing list