[Python-ideas] Python's Source of Randomness and the random.py module Redux

Thu Sep 10 02:19:20 CEST 2015

Deprecating the module-level functions has one problem for backward compatibility: if you're using random across multiple modules, changing them all from this:

    import random

... to this:

    from random import DeterministicRandom
    random = DeterministicRandom()

... gives a separate MT for each module. You can work around that by, e.g., providing your own myrandom.py that does that and then using "from myrandom import random" everywhere, or by stashing a random_inst inside the random module or builtins or something and only creating it if it doesn't exist, etc., but all of these are things that people will rightly complain about.

One possible solution is to make DeterministicRandom a module instead of a class, and move all the module-level functions there, so people can just change their import to "from random import DeterministicRandom as random". (Or, alternatively, give it classmethods that create a singleton just like the module global.)

For people who decide they want to switch to SystemRandom, I don't think it's as much of a problem, as they probably won't care that they have a separate instance in each module. (And I don't think there's any security problem with using multiple instances, but I haven't thought it through...) So, the change is probably only needed in DeterministicRandom.

There are hopefully better solutions than that. But I think some solution is needed. People who have existing code (or textbooks, etc.) that do things the "wrong" way and get a DeprecationWarning should be able to easily figure out how to make their code correct.

Sent from my iPhone

> On Sep 9, 2015, at 17:01, Donald Stufft <donald at stufft.io> wrote:
> 
> Ok, I reached out to Theo de Raadt to talk to him about what he was suggesting
> without Guido having to play messenger and forward fragments of the email
> conversation. I'm starting a new thread because this email is rather long, and
> I'm hoping to divorce it a bit from the back and forth about a proposal that
> wasn't exactly what Theo was suggesting that is being discussed in the other
> thread.
> 
> Essentially, there are three basic types of uses of random (the concept, not
> the module). Those are:
> 
> 1. People/usecases who absolutely need deterministic output given a seed and
>    for whom security properties don't matter.
> 2. People/usecases who absolutely need a cryptographically random output and
>    for whom having a deterministic output is a downside.
> 3. People/usecases that fall somewhere in between where it may or may not be
>    security sensitive or it may not be known if it's security sensitive.
> 
> The people in group #1 are currently, in the Python standard library, best
> served using the MT random source as it provides exactly the kind of determinsm
> they need. The people in group #2 are currently, in the Python standard
> library, best served using os.urandom (either directly or via
> random.SystemRandom).
> 
> However, the third case is the one that Theo's suggestion is attempting to
> solve. In the current landscape, the security minded folks will tell these
> people to use os.urandom/random.SystemRandom and the performance or otherwise
> less security minded folks will likely tell them to just use random.py. Leaving
> these people with a random that is not cryptographically safe.
> 
> The questin then is, does it matter if #3 are using a cryptographically safe
> source of randomness? The answer is obviously that we don't know, and it's
> possible that the user doesn't know. In these cases it's typically best if we
> default to the more secure option and expect people to opt in to insecurity.
> 
> In the case of randomness, a lot of languages (Python included) don't do that
> and instead they opt to pick the more peformant option first, often with the
> argument (as seen in the other thread) that if people need a cryptographically
> secure source of random, they'll know how to look for it and if they don't
> know how to look for it, then it's likely they'll have some other security
> problem. I think (and I believe Theo thinks) this sort of thinking is short
> sighted. Let's take an example of a web application, it's going to need session
> identifiers to put into a cookie, you'll want these to be random and it's not
> obvious on the tin for a non-expert that you can't just use the module level
> functions in the random module to do this. Another examples are generating API
> keys or a password.
> 
> Looking on google, the first result for "python random password" is
> StackOverflow which suggests:
> 
>     ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(N))
> 
> However, it was later edited to, after that, include:
> 
>     ''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(N))
> 
> So it wasn't obvious to the person who answered that question that the random
> module's module scoped functions were not appropiate for this use. It appears
> that the original answer lasted for roughly 4 years before it was corrected,
> so who knows how many people used that in those 4 years.
> 
> The second result has someone asking if there is a better way to generate a
> random password in Python than:
> 
>     import os, random, string
> 
>     length = 13
>     chars = string.ascii_letters + string.digits + '!@#$%^&*()'
>     random.seed = (os.urandom(1024))
> 
>     print ''.join(random.choice(chars) for i in range(length))
> 
> This person obviously knew that os.urandom existed and that he should use it,
> but failed to correctly identify that the random module's module scoped
> functions were not what he wanted to use here.
> 
> The third result has this code:
> 
>     import string
>     import random
> 
>     def randompassword():
>         chars=string.ascii_uppercase + string.ascii_lowercase + string.digits
>         size=8 
>         return ''.join(random.choice(chars) for x in range(size,12))
> 
> I'm not going to keep pasting snippets, but going through the results it is
> clear that in the bulk of cases, this search turns up code snippets that
> suggest there is likely to be a lot of code out there that is unknownly using
> the random module in a very insecure way. I think this is a failing of the
> random.py module to provide an API that guides users to be safe which was
> attempted to be papered over by adding a warning to the documentation, however
> like has been said before, you can't solve a UX problem with documentation.
> 
> Then we come to why might we want to not provide a safe random by default for
> the folks in the #3 group. As we've seen in the other thread, this basically
> boils down to the fact that for a lot of users they don't care about the
> security properties and they just want a fast random-esque value. This
> particular case is made stronger by the fact that there is a lot of code out
> there using Python's random module in a completely safe way that would regress
> in a meaningful way if the random module slowed down.
> 
> The fact that speed is the primary reason not to give people in #3 a
> cryptographically secure source of random by default is where we come back to
> the meat of Theo's suggestion. His claim is that invoking os.urandom through
> any of the interfaces imposes a performance penalty because it has to round
> trip through the kernel crypto sub system for every request. His suggestion is
> essentially that we provide an interface to a modern, good, userland 
> cryptographically secure source of random that is running within the same
> process as Python itself. One such example of this is the arc4random function
> (which doesn't actually provide ARC4 on OpenBSD, it provides ChaCha, it's not
> tied to one specific algorithm) which comes from libc on many platforms.
> According to Theo, modern userland CSPRNGs can create random bytes faster than
> memcpy which eliminates the argument of speed for why a CSPRNG shouldn't be
> the "default" source of randomness.
> 
> Thus the proposal is essentially:
> 
> * Provide an API to access a modern userland CSPRNG.
> * Provide an implementation of random.SomeKindOfRandom that utilizes this.
> * Move the MT based implementation of the random module to
>   random.DeterministicRandom.
> * Deprecate the module scoped functions, instructing people to use the new
>   random.SomeKindofRandom unless they need deterministic random, in which case
>   use random.DeterministicRandom.
> 
> This can of course be tweaked one way or the other, but that's the general idea
> translated into something actionable for Python. I'm not sure exactly how I
> feel about it, but I certainly do think that the current situation is confusing
> to end users and leaving them in an insecure state, and that a minimum we
> should move MT to something like random.DeterministicRandom and deprecate the
> module scoped functions because it seems obvious to me that the idea of a
> "default" random function that isn't safe is a footgun for users.
> 
> As an additional consideration, there are security experts who believe that
> userland CSPRNGs should not be used at all. One of those is Thomas Ptacek who
> wrote a blog post [1] on the subject. In this, Thomas makes the case that a
> userland CSPRNG pretty much always depends on the cryptographic security of
> the system random, but that it itself may be broken which means you're adding
> a second, single point of failure where a mistake can cause you to get
> non-random data out of the system. I had asked Theo about this, and he stated
> that he disagreed with Thomas about never using a userland CSPRNG and in his
> opinion that blog post was mostly warning people away from using something like
> MT in the userland and away from /dev/random (which is often the cause of
> people reaching for MT because /dev/random blocks which makes programs even
> slower).
> 
> It seems to boil down to, do we want to try to protect users by default or at
> least make it more obvious in the API which one they want to use (I think yes),
> and if so do we think that /dev/urandom is "fast enough" for most people in
> group #3 and if not, do we agree with Theo that a modern userland CSPRNG is
> safe enough to use, or do we agree with Thomas that it's not and if we think
> that it is, do we use arc4random and what do we do on systems that don't have
> a modern userland CSPRNG in their libc.
> 
> [1] http://sockpuppet.org/blog/2014/02/25/safely-generate-random-numbers/
> 
> -----------------
> Donald Stufft
> PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
> 
> 
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/