SHA-based subclass for random module

Fri Mar 19 05:01:54 EST 2004

python at rcn.com (Raymond Hettinger) writes:
> I offer this as an alternative:
> 
> from random import Random
> from struct import unpack
> import md5

Why md5 instead of sha?

> class MD5Random(Random):
>     def newrandom(self,    tofloat = 1.0 / 2 ** 53):
>         plaintxt = str(Random.random(self))

What's Random.random(self) supposed to do?

>         ciphertxt = md5.new(plaintxt).digest()

I think you mean update.

>         randint = unpack('<Q', ciphertxt[:8])[0] >> 11  # grab 53 bits
>         return randint * tofloat
> 
> Advantages over the original:
> 
> * Much faster

Yes, other version wasn't optimized.  Both call the hash function
approx once per output though.

> * Much shorter code

But doesn't support all the operations specified in the Random API.
What's up with that?

> * More readable code

Perhaps so.  It took me a little while to figure out in the previous
version that the security came from revealing only part of the output
of each hash (107 secret bits for SHA).  In the md5 version there's
only 75 secret bits, which is cutting things a little close.  It may
be harder to do a concrete security proof for this construction too
(I'm not sure, I've never done one).

> * Threadsafe

Always a plus.

> * Takes advantage of MT's proven equidistribution, of its passing
> known tests for randomness, and of its known period (in contrast,
> SHA-1 was designed as a digest that makes it computationally
> intractable to find a different message giving the same signature --
> that does *not* imply the non-existence of short cycles for some
> keys).

I'm missing something here--if you're just trying to avoid possible
cycles, why do you want to use MT instead of a sequential counter?
The counter is guaranteed to never cycle (unless you overflow the 512
bit compression function input, which won't happen in the life of the
universe).

Note that while there's probably some short cycles in the double-SHA
construction, the chance of finding one is infinitesimally small (it
means finding a 320 bit birthday collision).  Hitting one by accident
is comparable to guessing an AES key by accident.  Finding one by
brute force or by cryptanalysis is again comparable to breaking AES.

> * It uses both MD5 and the Mersenne Twister as they were designed (in
> contrast, my google searches show *no* research on using SHA-1 in OFB
> and reapplying SHA-1 again to conceal the output).

Applying a collision resistant hash function to itself is a standard
construction for making a PRF and is the basis of HMAC.  I believe
the relevant paper is

    Bellare, M., Canetti, R., and Krawczyk, H., "Pseudorandom Functions
    Revisited: The Cascade Construction". Proc. of the 37th IEEE Symp. on
    Foundation of Computer Science, 1996, webbed (PostScript) at:

     http://www.research.ibm.com/security/bck1.ps

but I haven't slogged all the way through it yet.  It has a concrete
security proof and I've always found those things incredibly fussy and
pedantic compared to normal math papers.  I haven't yet figured out
the reason why they're like that, or if they're really need to be.
I'd like to attempt to do one for this generator though, once we
settle on an algorithm.

See also the other references to RFC 2104, notably Krawczyk et al's
original HMAC paper from Crypto 96.

My guess is the reason you haven't found any publications specific to
SHA1 with this construction is that nobody has found any results
against SHA1.  As far as anyone can tell, the output is
undistinguishable from random.