SHA-based subclass for random module

Mon Mar 22 23:14:47 EST 2004

David MacQuigg <dmq at gain.com> writes:
> I'm still curious as to the answer to my very simple and basic
> question:  Are there any examples that show a *non-subjective*
> difference in a physically meaningful result between one PRNG and
> another?  By "physically meaningful" I mean to include things like
> means, standard deviations, correlations between two distributions,
> but rule out some measure that would only mean something to a
> mathematician.  By "non-subjective" I mean a difference that can be
> observed in a repeatable fashion, where the statistical significance
> is not always at the edge when you repeat the experiment with more
> trials.

Tim gave a good answer for stuff like physics simulations.  Another
answer might be in terms of something like a computer game: you're
playing poker online with the cards dealt by a server programmed in
Python using some PRNG.  Both players trust that the server isn't
cheating, and both have a complete copy of the server source code and
all the data that was on it at the start of the game except the PRNG's
initial seed, and are permitted to use that info in any way they can.
Since you're playing for a US$ 250,000 jackpot (sites like
partypoker.com routinely run online tournaments with prizes that
size), you must assume that your opponent has a team of mathemeticians
analyzing the PRNG algorithm and all its previous observed output, in
order to get some probabilistic edge in guessing what your cards are.
If they fail to get that probabilistic edge, your chances are even,
but if they succeed, they take you to the cleaners.  I would call that
a non-sujective difference ;-).  

Another simulation example: you're trying to program a computer to
play poker or backgammon, using neural nets, genetic algorithms, or
whatever the AI buzzword of the week is.  You train the AI by having
it play a zillion simulated games against itself and eventually it's
playing very well.  Has it really tuned itself for the game, or has it
somehow modelled the PRNG and figured out how to guess the next card?
Simulating systems with this kind of emergent behavior can make
heavier demands on PRNG's than simulating random gas molecules.

The point is, a pure statistical analysis of the swirling particles in
the universe may predict the existence of stars, galaxies, planets,
and maybe even living organisms, but normally won't treat emergent
characteristics that lead to things like computers and trombones.  If
you're trying to build a laser cannon to protect your house from
meteors, it's one thing to calculate the number of stray asteroids
whose trajectories might randomly intersect your house and build your
laser accordingly, but a really good defense system has to allow that
someone may be aiming stuff at you deliberately.  Similarly, a really
good PRNG has to resist deliberate attempts to predict its output or
even to notice that the output isn't perfectly random.

The modern definition of a good PRNG basically says there should be no
polynomial-time algorithm for distinguishing the PRNG from a true
random source with probability better than chance.  There's a body of
theoretical CS work that's been done on this subject.  See the
pseudorandomness chapter of Goldreich's "Foundations of Cryptography" at

   ftp://theory.lcs.mit.edu/pub/people/oded/BookFrag/part3N.ps

if you want to see the details.

Of course nobody has ever been able to demonstrate the existence of a
good PRNG, since that runs smack into the P vs NP problem.  However,
there are solid results showing how to build PRNG's from secure
primitives (e.g. SHA) if we're willing to assume the security claims
for those primitives.  That seems to be about what the current state
of the art is.