SHA-based subclass for random module

Sat Mar 20 01:07:51 EST 2004

> Why md5 instead of sha?

You could go either way.  Make the decision based on how many bits you
need (128 should be plenty) for generating a single 53-bit float and
then take into account speed considerations (MD5 vs SHA-1) and the
time to generate the bits going into the message digest (it takes
longer to generate 160 bits than 128).

> What's Random.random(self) supposed to do?

It's called subclassing ;-)

In this case, it inherits all the other random functionality (being
sure to include my corrected method name from newrandom() to
random()).

> >         ciphertxt = md5.new(plaintxt).digest()
> 
> I think you mean update.

Nope, I meant digest(), but it could be done with update as well.

> But doesn't support all the operations specified in the Random API.
> What's up with that?

Sure it does.  That is what the subclassing is for.  Again be sure to
incorporate the two submitted revisions resulting in:

from random import Random
from struct import unpack
import md5

class MD5Random(Random):
    def random(self,    tofloat = 1.0 / 2 ** 53):
        plaintxt = str(Random.getrandbits(self, 128))
        ciphertxt = md5.new(plaintxt).digest()
        randint = unpack('<Q', ciphertxt[:8])[0] >> 11  # grab 53 bits
        return randint * tofloat

> > * Takes advantage of MT's proven equidistribution, of its passing
> > known tests for randomness, and of its known period (in contrast,
> > SHA-1 was designed as a digest that makes it computationally
> > intractable to find a different message giving the same signature --
> > that does *not* imply the non-existence of short cycles for some
> > keys).
> 
> I'm missing something here--if you're just trying to avoid possible
> cycles, why do you want to use MT instead of a sequential counter?

Either my version or your original requires something that generates a
sequence of randoms from a seed (mine does it with MT and yours with
s0=sha1(s0)).  The advantages of MT are proven properties, and it can
more efficiently generate an arbitrary number of bits (thanks to
genrandbits()) without the messy pure python code for extracting a
byte at a time.  It was also nice to not have to override jumpahead(),
getstate(), setstate(), etc.

For the base generator, MT or s0-sha1(s0), no cryptography is
necessary; you're getting that from the second application of sha1 or
md5.

If you use sha1 for the base generator, you gain cryptographic
strength to the left which is useless for your application (lottery
numbers are selected in a way that is strong to the left and right,
but no one cares about inferring last weeks numbers from this weeks,
you make money only if you can predict the following week).  It is
doubly useless, because the second application of SHA1 gives you what
you need.

> > * It uses both MD5 and the Mersenne Twister as they were designed (in
> > contrast, my google searches show *no* research on using SHA-1 in OFB
> > and reapplying SHA-1 again to conceal the output).
> 
> Applying a collision resistant hash function to itself is a standard
> construction for making a PRF and is the basis of HMAC.

Re-reading your sentence carefully, does it say that SHA-1 is
collision resistant.  The design goal for SHA-1, of necessity,
produces collisions for messages over 160 bits in length and AFAICT
there are no citations showing collision resistance at 160 bits.

> I believe
> the relevant paper is
> 
>     Bellare, M., Canetti, R., and Krawczyk, H., "Pseudorandom Functions
>     Revisited: The Cascade Construction". Proc. of the 37th IEEE Symp. on
>     Foundation of Computer Science, 1996, webbed (PostScript) at:
> 
>      http://www.research.ibm.com/security/bck1.ps

Good paper, but read it carefully.  Basically what it starts out
saying is that not all trapdoor functions make good PRFs, and then it
sets about how to construct functions that are both trapdoor and
collision resistant (with positive implications for the minimum
period).

> My guess is the reason you haven't found any publications specific to
> SHA1 with this construction is that nobody has found any results
> against SHA1.  As far as anyone can tell, the output is
> undistinguishable from random.

I wish you would abandond this line of reasoning.  In security
applications, proofs rule and the absence of published flaws means
nothing (consider the case of Engima where the designers relied on
"nobody has found any results
against" their cipher).  

In this particular case, the finding a short cycle would not really be
a major flaw; the function still meets its design goal of producing a
message digest for a given message such that it is difficult to
produce another message with the same digest.  A few known 160-bit
collisions would not be an issue except for those exact 160 bit
messages.  IOW, even if some collisions are known, don't expect to
find a paper on it.

All of that is moot, MT is fast, subclasses nicely, is threadsafe, has
proven distribution properties, and has a known (large) period.  Why
not digest it and produce your shielded random numbers?  Or is is just
a matter of being glued to your originally published code (which would
be correct if collision resistance is known)?

Raymond Hettinger