[Python-ideas] Should our default random number generator be secure?

Tim Peters tim.peters at gmail.com
Sat Sep 12 05:23:42 CEST 2015


[Marc-Andre]
...
>> While implementing this, I found that there's a bit more trickery
>> involved due to the fact that the MT RNG in Python writes the
>> 624 words internal state in batches - once every 624 times
>> the .getrandbits() function is called.
>>
>> So you may need up to 624*2 - 1 output values to determine a
>> correct array of internal state values.

[Tim]
> Don't be too sure about that.  From an information-theoretic view,
> "it's obvious" that 624 32-bit outputs is enough - indeed, that's 31
> more bits than the internal state actually has. You don't need to
> reproduce Python's current internal MT state exactly, you only need to
> create _a_ MT state that will produce the same values forever after.
> Specifically, the index of the "current" vector element is an artifact
> of the implementation, and doesn't need to be reproduced.  You're free
> to set that index to anything you like in _your_ MT state - the real
> goal is to get the same results.

Concrete proof of concept.  First code to reconstruct state from 624
consecutive 32-bit outputs:

    def invert(transform, output, n=100):
        guess = output
        for i in range(n):
            newguess = transform(guess)
            if newguess == output:
                return guess
            guess = newguess
        raise ValueError("%r not invertible in %s tries" %
                         (output, n))

    t1 = lambda y: y ^ (y >> 11)
    t2 = lambda y: y ^ ((y << 7) & 0x9d2c5680)
    t3 = lambda y: y ^ ((y << 15) & 0xefc60000)
    t4 = lambda y: y ^ (y >> 18)

    def invert_mt(y):
        y = invert(t4, y)
        y = invert(t3, y)
        y = invert(t2, y)
        y = invert(t1, y)
        return y

    def guess_state(vec):
        assert len(vec) == 624
        return (3,
                tuple(map(invert_mt, vec)) + (624,),
                None)

Now we can try it:

    import random
    for i in range(129):
        random.random()

That loop was just to move MT into "the middle" of its internal
vector.  Now grab values:

    vec = [random.getrandbits(32) for i in range(624)]

Note that the `guess_state()` function above _always_ sets the index
to 624.  When it becomes obvious _why_ it does so, all mysteries will
vanish ;-)

Now create a distinct generator and force its state to the deduced state:

    newrand = random.Random()
    newrand.setstate(guess_state(vec))

And some quick sanity checks:

    for i in range(1000000):
        assert random.random() == newrand.random()
    for i in range(1000000):
        assert random.getrandbits(32) == newrand.getrandbits(32)

The internal states are _not_ byte-for-byte identical.  But they don't
need to be.  The artificial `index` bookkeeping variable allows
hundreds of distinct spellings of _semantically_ identical states.


More information about the Python-ideas mailing list