[Python-ideas] Should our default random number generator be secure?

Tim Peters tim.peters at gmail.com
Fri Sep 11 18:36:55 CEST 2015


[M.-A. Lemburg <mal at egenix.com>]
> ...
> Now, leaving aside this bright future, what's reasonable today ?
>
> If you look at tools like untwister:
>
>     https://github.com/bishopfox/untwister
>
> you can get a feeling for how long it takes to deduce the
> seed from an output sequence. Bare in mind, that in order
> to be reasonably sure that the seed is correct, the available
> output sequence has to be long enough.
>
> That's a known plain text attack, so you need access to lots
> of session keys to begin with.
>
> The tools is still running on an example set of 1000 32-bit
> numbers and it says it'll be done in 1.5 hours, i.e. before
> the sun goes down in my timezone. I'll leave it running to
> see whether it can find my secret key.

I'm only going to talk about current Python 3, because _any_ backward
incompatible change is off limits for a bugfix release.

So:

1. untwister appears _mostly_ to be probing for poor seeding schemes.
Python 3's default "by magic" seeding is unimpeachable ;-)  It's
computationally infeasible to attack it.

2. If they knew they were targeting MT, and had 624 consecutive 32-bit
outputs, they could compute MT's full internal state essentially
instantly.

#2 is hard to get, though.  These "pick a passward" examples are only
using a relative handful of bits from each 32-bit MT output.  Attacks
with such spotty info are "exponentially harder".


> Untwister is only slightly smarter than bruteforce. Given
> that MT has a seed size of 32 bits, it's not surprising that
> a tool can find the seed within a day.

No no no.  MT's state is 19937 bits, and current .seed()
implementations use every bit you pass to .seed().

By default, current Python seeds the state with 2500 bytes (20000
bits) from the system .urandom() (if available).  That's why it's
computationally infeasible for "poor seeding" searches to attack the
default seeding:  they have a space of 2**19937-1 (the all-0 state
can't occur) to search through, each of which is equally likely
(assuming the system .urandom() is doing _its_ job).

Of course the user can screw that up by using their _own_ seed.  But,
by default, current Pythons already do the best possible seeding job.


> Perhaps it's time to switch to a better version of MT, e.g.
> a 64-bit version (with 64-bit internal state):
>
>     http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt64.html
>
> or an even faster SIMD variant with better properties and
> 128 bit internal state:
>
>     http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html
>
> Esp. the latter will help make brute force attacks practically
> impossible.
>
> Tim ?

We already have a 19937-bit internal state, and current seeding
schemes don';t hide that.

I would like to move to a different generator entirely someday, but
not before some specific better-than-MT alternative gains significant
traction outside the Python world ("better a follower than a leader"
in this area).


> BTW: Looking at the sources of the _random module, I found that
> the seed function uses the hash of non-integers such as e.g.
> strings passed to it as seeds. Given the hash randomization
> for strings this will create non-deterministic results, so it's
> probably wise to only use 32-bit integers as seed values for
> portability, if you need to rely on seeding the global Python
> RNG.

None of that applies to Python 3.  `seed()` string inputs go through
this path now:

            if isinstance(a, (str, bytes, bytearray)):
                if isinstance(a, str):
                    a = a.encode()
                a += _sha512(a).digest()
                a = int.from_bytes(a, 'big')
        super().seed(a)

IOW, a crypto hash is _appended_ to the string, but no info from the
original string is lost (but, if you ask me, this particular step is
useless - it adds no "new entropy").  The whole mess is converted to a
giant integer, again with no loss of input information.  And every bit
of the giant integer affects what `super().seed(a) does`.


More information about the Python-ideas mailing list