[Python-ideas] Python's Source of Randomness and the random.py module Redux

Mon Sep 14 15:29:11 CEST 2015

On 14 September 2015 at 13:59, Antoine Pitrou <antoine at python.org> wrote:>
> Endly, the premise of this discussion is idealistic in the first place.
> If someone doesn't realize their code is security-sensitive, there
> are other mistakes they will make than simply choosing the wrong
> RNG.  If you want to help people generate secure passwords, best would
> be perhaps to write a password-generating (or more generally
> secret-generating, for different kinds of secrets: passwords, session
> ids, etc.) library.

Is your argument that there are lots of ways to get security wrong,
and for that reason we shouldn't try to fix any of them? After all, I
could have made this argument against PEP 466, or against the
deprecation of SHA1 in TLS certificates, or against any security
improvement ever made that simply changed defaults. The fact that
there are secure options available is not a good excuse for leaving
the insecure ones as the defaults.

And let's be clear, this is not a theoretical error that people don't
hit in real life. Investigating your last comment, Antoine, I googled
"python password generator". The results:

- The first one is a StackOverflow question which incorrectly uses
random.choice (though seeded from os.urandom, which is an
improvement). The answer to that says to just use os.urandom
everywhere, but does not provide sample code. Only the third answer
gets so far as to provide sample code, and it's way overkill.
- The second option, entitled "A Better Password Generator",
incorrectly uses random.randrange. This code is *aimed at beginners*,
and is kindly handing them a gun to point at their own foot.
- The third one uses urandom, which is fine
- The fourth, an XKCD-based password generator, uses SystemRandom *if
available* but then falls back to the MT approach, which is an
unexpected decision, but there we go.
- The fifth, from "pythonforbeginners.com", incorrectly uses random.choice
- The sixth goes into an intensive discussion about 'password
strength', including a discussion about the 'bit strength' of the
password, despite the fact that they use random.randint which means
that the analysis about bit strength is totally flawed.
- For the seventh we get a security.stackexchange question with the
first answer saying not to use Random, though the questioner does use
it and no sample code is provided.
- The eight is a library that "generates randomized strings of
characters". It attempts to use SystemRandom but falls back silently
if it's unavailable.

At this point I gave up. Of that list of 8 responses, three are
completely wrong, two provide sample code that is wrong with no
correct sample code to be found on the page, two attempt to do the
right thing but will fall into a silent failure mode if they can't,
and only one is unambiguously correct.

Similarly, a quick search of GitHub for Python repositories that
contain random.choice and the string 'password' returns 40,000
results.[0] Even if 95% of them are safe, that leaves 2000 people who
wrote wrong code and uploaded it to GitHub.

It is disingenuous to say that only people who know enough write
security-critical code. They don't. The reason for this is that most
people don't know they don't know enough. And for those people,
Python's default approach screws them over, and then they write blog
posts which screw over more people.

If the Python standard library would like to keep the insecure default
of random.random that's totally fine, but we shouldn't pretend that
the resulting security failures aren't our fault: they absolutely are.

 [0]: https://github.com/search?l=python&q=random.choice+password&ref=searchresults&type=Code&utf8=%E2%9C%93