Is this secure?

Steven D'Aprano steven at REMOVE.THIS.cybersource.com.au
Tue Feb 23 21:40:13 EST 2010


On Tue, 23 Feb 2010 15:36:02 +0100, mk wrote:

> The question is: is this secure? That is, can the string generated this
> way be considered truly random?

Putting aside the philosophical question of what "truly random" means, I 
presume you mean that the letters are uniformly distributed. The answer 
to that is, they don't like uniformly distributed.

This isn't a sophisticated statistical test, it's the equivalent of a 
back-of-the-envelope calculation: I generated 100,000 random strings with 
your code, and counted how often each letter appears:

If the letters are uniformly distributed, you would expect all the 
numbers to be quite close, but instead they range from 15063 to 25679:

{'a': 15063, 'c': 20105, 'b': 15100, 'e': 25465, 'd': 25458, 'g': 25597, 
'f': 25589, 'i': 25045, 'h': 25679, 'k': 22945, 'j': 25531, 'm': 16187, 
'l': 16252, 'o': 16076, 'n': 16012, 'q': 16069, 'p': 16119, 's': 16088, 
'r': 16087, 'u': 15951, 't': 16081, 'w': 16236, 'v': 15893, 'y': 15834, 
'x': 15956}

Eye-balling it, it looks vaguely two-humped, one hump around 15-16K, the 
second around 22-25K. Sure enough, here's a quick-and-dirty graph:

a  | ***********************************
b  | ***********************************
c  | ***********************************************
d  | ***********************************************************
e  | ***********************************************************
f  | ************************************************************
g  | ************************************************************
h  | ************************************************************
i  | ***********************************************************
j  | ************************************************************
k  | ******************************************************
l  | **************************************
m  | **************************************
n  | *************************************
o  | **************************************
p  | **************************************
q  | **************************************
r  | **************************************
s  | **************************************
t  | **************************************
u  | *************************************
v  | *************************************
w  | **************************************
x  | *************************************
y  | *************************************


The mean of the counts is 19056.72, and the mean deviation is 3992.28. 
While none of this is statistically sophisticated, it does indicate to me 
that your function is nowhere even close to uniform. It has a very strong 
bias.



-- 
Steven



More information about the Python-list mailing list