[Python-ideas] Should our default random number generator be secure?

Sat Sep 12 13:35:48 CEST 2015

On 12.09.2015 13:31, M.-A. Lemburg wrote:
> On 12.09.2015 05:23, Tim Peters wrote:
>> [Marc-Andre]
>> ...
>>>> While implementing this, I found that there's a bit more trickery
>>>> involved due to the fact that the MT RNG in Python writes the
>>>> 624 words internal state in batches - once every 624 times
>>>> the .getrandbits() function is called.
>>>>
>>>> So you may need up to 624*2 - 1 output values to determine a
>>>> correct array of internal state values.
>>
>> [Tim]
>>> Don't be too sure about that.  From an information-theoretic view,
>>> "it's obvious" that 624 32-bit outputs is enough - indeed, that's 31
>>> more bits than the internal state actually has. You don't need to
>>> reproduce Python's current internal MT state exactly, you only need to
>>> create _a_ MT state that will produce the same values forever after.
>>> Specifically, the index of the "current" vector element is an artifact
>>> of the implementation, and doesn't need to be reproduced.  You're free
>>> to set that index to anything you like in _your_ MT state - the real
>>> goal is to get the same results.
>>
>> Concrete proof of concept.  First code to reconstruct state from 624
>> consecutive 32-bit outputs:
>>
>>     def invert(transform, output, n=100):
>>         guess = output
>>         for i in range(n):
>>             newguess = transform(guess)
>>             if newguess == output:
>>                 return guess
>>             guess = newguess
>>         raise ValueError("%r not invertible in %s tries" %
>>                          (output, n))
>>
>>     t1 = lambda y: y ^ (y >> 11)
>>     t2 = lambda y: y ^ ((y << 7) & 0x9d2c5680)
>>     t3 = lambda y: y ^ ((y << 15) & 0xefc60000)
>>     t4 = lambda y: y ^ (y >> 18)
>>
>>     def invert_mt(y):
>>         y = invert(t4, y)
>>         y = invert(t3, y)
>>         y = invert(t2, y)
>>         y = invert(t1, y)
>>         return y
>>
>>     def guess_state(vec):
>>         assert len(vec) == 624
>>         return (3,
>>                 tuple(map(invert_mt, vec)) + (624,),
>>                 None)
>>
>> Now we can try it:
>>
>>     import random
>>     for i in range(129):
>>         random.random()
>>
>> That loop was just to move MT into "the middle" of its internal
>> vector.  Now grab values:
>>
>>     vec = [random.getrandbits(32) for i in range(624)]
>>
>> Note that the `guess_state()` function above _always_ sets the index
>> to 624.  When it becomes obvious _why_ it does so, all mysteries will
>> vanish ;-)
>>
>> Now create a distinct generator and force its state to the deduced state:
>>
>>     newrand = random.Random()
>>     newrand.setstate(guess_state(vec))
>>
>> And some quick sanity checks:
>>
>>     for i in range(1000000):
>>         assert random.random() == newrand.random()
>>     for i in range(1000000):
>>         assert random.getrandbits(32) == newrand.getrandbits(32)
>>
>> The internal states are _not_ byte-for-byte identical.  But they don't
>> need to be.  The artificial `index` bookkeeping variable allows
>> hundreds of distinct spellings of _semantically_ identical states.
> 
> It's a rolling index, yes, but when creating the vector of output
> values, the complete internal state array will have undergone
> a recalc at one of the iterations.
> 
> The guess_state(vec) function will thus return an internal
> state vector that is half state of the previous recalc run,
> half new recalc run, it is not obvious to me why you would
> still be able to get away with not synchronizing to the next
> recalc in order to have a complete state from the current recalc.
> 
> Let's see...
> 
> The values in the state array are each based on
> 
> a) previous state[i]
> b) state[(i + 1) % 624]
> c) state[(i + 397) % 624]
> 
> Since the calculation is forward looking, your trick will only
> work if you can make sure that i + 397 doesn't wrap around
> into the previous state before you trigger the recalc in
> newrand.
> 
> Which is easy, of course, since you can control the current
> index of newrand and force it to do a recalc with the next
> call to .getrandbits() by setting it to 624.
> 
> Clever indeed :-)
> 
> Here's a better way to do the inversion without guess work:
> 
> # 32-bits all set
> ALL_BITS_SET = 0xffffffffL
> 
> def undo_bitshift_right_xor(value, shift, mask=ALL_BITS_SET):
> 
>     # Set shift high order bits; there's probably a better way to
>     # do this, but this does the trick for now
>     decoding_mask = (ALL_BITS_SET << (32 - shift)) & ALL_BITS_SET
>     decoded_part = 0
>     result = 0
>     while decoding_mask > 0:
>         decoded_part = (value ^ (decoded_part & mask)) & decoding_mask
>         result |= decoded_part
>         decoded_part >>= shift
>         decoding_mask >>= shift
>     return result
> 
> def undo_bitshift_left_xor(value, shift, mask=ALL_BITS_SET):
> 
>     # Set shift low order bits
>     decoding_mask = ALL_BITS_SET >> (32 - shift)
>     decoded_part = 0
>     result = 0
>     while decoding_mask > 0:
>         decoded_part = (value ^ (decoded_part & mask)) & decoding_mask
>         result |= decoded_part
>         decoded_part = (decoded_part << shift) & ALL_BITS_SET
>         decoding_mask = (decoding_mask << shift) & ALL_BITS_SET
>     return result
> 
> def recover_single_state_value(value):
> 
>     value = undo_bitshift_right_xor(value, 18)
>     value = undo_bitshift_left_xor(value, 15, 0xefc60000L)
>     value = undo_bitshift_left_xor(value, 7, 0x9d2c5680L)
>     value = undo_bitshift_right_xor(value, 11)
>     return value
> 
> def guess_state(data):

Hmm, the name doesn't fit anymore, better call it:

  def recover_state(data):

>     if len(data) < 624:
>         raise TypeError('not enough data to recover state')
> 
>     # Only work with the 624 last entries
>     data = data[-624:]
>     state = [recover_single_state_value(x)
>              for x in data]
>     return (3,
>             tuple(state) + (624,),
>             None)
> 
> This is inspired by the work of James Roper, but uses a slightly
> faster approach for the undo functions. Not that it matters much.
> It was fun, that's what matters :-)
> 
> Oh, in Python 3, you need to remove the 'L' after the constants.
> Too bad that it doesn't recognize those old annotations anymore. 

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Sep 12 2015)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> mxODBC Plone/Zope Database Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2015-09-18: PyCon UK 2015 ...                               6 days to go
2015-10-21: Python Meeting Duesseldorf ...                 39 days to go

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/