SHA-based subclass for Random module

Paul Rubin http
Mon Mar 15 21:26:10 EST 2004


Josiah Carlson <jcarlson at nospam.uci.edu> writes:
> FYI: Mersenne Twister is a pretty spiffy pseudo random number
> generator...it is also /really/ fast.

Yes, but for some purposes, the output is insufficiently random.

> On the one hand, the SHA hash is not known to be generally invertable
> for any of the uniquely-mapped 160 bit input strings, or really for
> any non-trivial initial states or input strings.  It could very well
> be that it is not invertable for any string or state, but I've not
> done suitable research to know for sure.  Quite nice for an invertable
> perspective and suggesting that short cycles in state are rare, if
> even possible.

Finding a short cycle or a way to invert SHA1 would be a major
research result in cryptography.

> On the other hand, you could just as easily use the MD5 hash.  The
> only thing you get from SHA is perhaps more bits of state.  I don't
> know which one is faster though.

MD5 is somewhat faster than SHA, but for this module, the Python
wrapper takes most of the CPU cycles.

> Really, they take input character-by-character, perterb some current
> state with some non-trivially flowing functions with feedback, and
> produce a seemingly-random output that /should/ be uniform over the
> output space.  Whether or not they produce output that /is/ uniform
> over the output space is probably the result of some involved paper
> that I've not read.

Finding any practical way to distinguish the output of SHA with a
secret internal state from a true RNG would be a significant research
result.

> One thing to remember is that the Mersenne Twister is known to cycle
> in state only after 2**19937-1 calls.  Since you haven't done any
> measurable analysis of your method, or how often it would cycle in
> state during the repeated re-setting of the state to the hash of some
> relatively short input-string, that is only pseudo-random near the end
> (the previous hash), I expect that yours may not fare much (if any)
> better.

It's likely that the SHA generator will cycle after about 2**80 calls
due to birthday collisions, and unlikely that there are significantly
shorter cycles.  That's enough for practical use.  A stupendously
long cycle length is irrelevant if the output is predictable.

> One could exploit the features of both, and perhaps end up with a
> random number generator that cycles less often (if sha cycles in S
> iterations, and the mersenne twister cycles in M iterations, a new
> algorithm that cycles in SM iterations is possible, if S and M are
> coprime), but without actually analyzing their interaction, the
> behavior of such a random number generator is really only open for
> speculation.

If you're concerned about sha1's cycle length, your best bet is to use
one of the larger sha1 variants, sha256 or sha512.  I used sha1 because
it's built into the Python library.



More information about the Python-list mailing list