SHA-based subclass for random module

Mon Mar 22 15:18:31 EST 2004

On Mon, 22 Mar 2004 13:41:46 -0500, "Tim Peters" <tim.one at comcast.net>
wrote:

>[David MacQuigg]
>> ...
>> I'm still curious as to the answer to my very simple and basic
>> question:
>
>It is basic, but there's nothing simple about it.
>
>> Are there any examples that show a *non-subjective* difference in a
>> physically meaningful result between one PRNG and another?
>
>Yes, but different RNGs have different weaknesses, and whether the
>weaknesses of a specific RNG *matter* to you depend on detailed analysis of
>the specific app you have in mind.
>
>The primary weakness of linear congruential generators (still the most
>common kind) is dramatically illustrated all over the web; e.g., look at the
>3D plot at:
>
>    http://www.unf.edu/ccec/cis/CIShtml/RANDUinfo.html
>
>If you need to generate "random" points in 3D space, does it matter that the
>particular LCG plotted there systematically generates points that fall on
>one of 15 planes, no matter how many points you generate?  There's no answer
>to that short of analyzing the specifics of what you're trying to accomplish
>by generating those points.  For example, if your app relies on the triples
>generated being approximately equally distributed throughout a 3D cube, that
>particular LCG is an obvious disaster for that particular application.  But
>if your app only cares that the projection of the points onto the x axis is
>approximately equally distributed, probably no problem, despite that it's
>obvious "by eyeball" that there's nothing even remotely "random" about their
>distribution in 3D space.
>
>You're not going to get an easy answer.  Well, here's one:  the Mersenne
>Twister in Python 2.3 is probably the best and most widely tested fast PRNG
>in existence now, and has provably good equidistribution properties in high
>dimensions (a plot like the one above would have no visible patterns if the
>Twister had been used instead).

This is the first good answer I have heard, and it has changed my
thinking on this question.  I understand now that the problem is not
just something that might be found in the tenth decimal place.  The
clustering of random points into very thin planes could radically
affect a simulation depending on the distances between those points.

The problem I am working on (statistical simulation of electronic
circuits) involves a large number of 1D distributions, but with
carefully controlled correlations between these distributions.  We are
probably just lucky that the PRNG we are using has not caused a
noticable problem, but seeing just one clear example like the above
has made me decide to be much more cautious in selecting a PRNG.

Thanks for the enlightenment.

-- Dave