[Python-ideas] Should our default random number generator be secure?

Sat Sep 12 13:31:48 CEST 2015

On 12.09.2015 05:23, Tim Peters wrote:
> [Marc-Andre]
> ...
>>> While implementing this, I found that there's a bit more trickery
>>> involved due to the fact that the MT RNG in Python writes the
>>> 624 words internal state in batches - once every 624 times
>>> the .getrandbits() function is called.
>>>
>>> So you may need up to 624*2 - 1 output values to determine a
>>> correct array of internal state values.
> 
> [Tim]
>> Don't be too sure about that.  From an information-theoretic view,
>> "it's obvious" that 624 32-bit outputs is enough - indeed, that's 31
>> more bits than the internal state actually has. You don't need to
>> reproduce Python's current internal MT state exactly, you only need to
>> create _a_ MT state that will produce the same values forever after.
>> Specifically, the index of the "current" vector element is an artifact
>> of the implementation, and doesn't need to be reproduced.  You're free
>> to set that index to anything you like in _your_ MT state - the real
>> goal is to get the same results.
> 
> Concrete proof of concept.  First code to reconstruct state from 624
> consecutive 32-bit outputs:
> 
>     def invert(transform, output, n=100):
>         guess = output
>         for i in range(n):
>             newguess = transform(guess)
>             if newguess == output:
>                 return guess
>             guess = newguess
>         raise ValueError("%r not invertible in %s tries" %
>                          (output, n))
> 
>     t1 = lambda y: y ^ (y >> 11)
>     t2 = lambda y: y ^ ((y << 7) & 0x9d2c5680)
>     t3 = lambda y: y ^ ((y << 15) & 0xefc60000)
>     t4 = lambda y: y ^ (y >> 18)
> 
>     def invert_mt(y):
>         y = invert(t4, y)
>         y = invert(t3, y)
>         y = invert(t2, y)
>         y = invert(t1, y)
>         return y
> 
>     def guess_state(vec):
>         assert len(vec) == 624
>         return (3,
>                 tuple(map(invert_mt, vec)) + (624,),
>                 None)
> 
> Now we can try it:
> 
>     import random
>     for i in range(129):
>         random.random()
> 
> That loop was just to move MT into "the middle" of its internal
> vector.  Now grab values:
> 
>     vec = [random.getrandbits(32) for i in range(624)]
> 
> Note that the `guess_state()` function above _always_ sets the index
> to 624.  When it becomes obvious _why_ it does so, all mysteries will
> vanish ;-)
> 
> Now create a distinct generator and force its state to the deduced state:
> 
>     newrand = random.Random()
>     newrand.setstate(guess_state(vec))
> 
> And some quick sanity checks:
> 
>     for i in range(1000000):
>         assert random.random() == newrand.random()
>     for i in range(1000000):
>         assert random.getrandbits(32) == newrand.getrandbits(32)
> 
> The internal states are _not_ byte-for-byte identical.  But they don't
> need to be.  The artificial `index` bookkeeping variable allows
> hundreds of distinct spellings of _semantically_ identical states.

It's a rolling index, yes, but when creating the vector of output
values, the complete internal state array will have undergone
a recalc at one of the iterations.

The guess_state(vec) function will thus return an internal
state vector that is half state of the previous recalc run,
half new recalc run, it is not obvious to me why you would
still be able to get away with not synchronizing to the next
recalc in order to have a complete state from the current recalc.

Let's see...

The values in the state array are each based on

a) previous state[i]
b) state[(i + 1) % 624]
c) state[(i + 397) % 624]

Since the calculation is forward looking, your trick will only
work if you can make sure that i + 397 doesn't wrap around
into the previous state before you trigger the recalc in
newrand.

Which is easy, of course, since you can control the current
index of newrand and force it to do a recalc with the next
call to .getrandbits() by setting it to 624.

Clever indeed :-)

Here's a better way to do the inversion without guess work:

# 32-bits all set
ALL_BITS_SET = 0xffffffffL

def undo_bitshift_right_xor(value, shift, mask=ALL_BITS_SET):

    # Set shift high order bits; there's probably a better way to
    # do this, but this does the trick for now
    decoding_mask = (ALL_BITS_SET << (32 - shift)) & ALL_BITS_SET
    decoded_part = 0
    result = 0
    while decoding_mask > 0:
        decoded_part = (value ^ (decoded_part & mask)) & decoding_mask
        result |= decoded_part
        decoded_part >>= shift
        decoding_mask >>= shift
    return result

def undo_bitshift_left_xor(value, shift, mask=ALL_BITS_SET):

    # Set shift low order bits
    decoding_mask = ALL_BITS_SET >> (32 - shift)
    decoded_part = 0
    result = 0
    while decoding_mask > 0:
        decoded_part = (value ^ (decoded_part & mask)) & decoding_mask
        result |= decoded_part
        decoded_part = (decoded_part << shift) & ALL_BITS_SET
        decoding_mask = (decoding_mask << shift) & ALL_BITS_SET
    return result

def recover_single_state_value(value):

    value = undo_bitshift_right_xor(value, 18)
    value = undo_bitshift_left_xor(value, 15, 0xefc60000L)
    value = undo_bitshift_left_xor(value, 7, 0x9d2c5680L)
    value = undo_bitshift_right_xor(value, 11)
    return value

def guess_state(data):

    if len(data) < 624:
        raise TypeError('not enough data to recover state')

    # Only work with the 624 last entries
    data = data[-624:]
    state = [recover_single_state_value(x)
             for x in data]
    return (3,
            tuple(state) + (624,),
            None)

This is inspired by the work of James Roper, but uses a slightly
faster approach for the undo functions. Not that it matters much.
It was fun, that's what matters :-)

Oh, in Python 3, you need to remove the 'L' after the constants.
Too bad that it doesn't recognize those old annotations anymore.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 12 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2015-09-18: PyCon UK 2015 ...                               6 days to go
2015-10-21: Python Meeting Duesseldorf ...                 39 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/