[Tutor] A file containing a string of 1 billion random digits.

Steven D'Aprano steve at pearwood.info
Sun Jul 18 03:01:32 CEST 2010


Do you care about speed? If this is a script that just needs to run 
once, it seems to me that the simplest, easiest to read solution is:

import random
def random_digit():
    return "0123456789"[random.randrange(10)]

f = open('rand_digits.txt', 'w')
for i in xrange(10**9):
    f.write(random_digit())

f.close()


This is, of course, horribly inefficient -- it generates digits one at a 
time, and worse, it writes them one at a time. I got bored waiting for 
it to finish after 20 minutes (at which time it was about 10% of the 
way through), but you could let it run in the background for as long as 
it takes.

If speed does matter, the first improvement is to generate larger 
streams of random digits at once. An even bigger improvement is to cut 
down on the number of disk-writes -- hard drives are a thousand times 
slower than RAM, so the more often you write to the disk, the worse off 
you are.


import random
def random_digits(n):
    "Return n random digits with one call to random."
    return "%0*d" % (n, random.randrange(10**n))

f = open('rand_digits.txt', 'w')
for i in xrange(1000):
    buffer = [random_digits(10) for j in xrange(100000)]
    f.write(''.join(buffer))

f.close()

On my not-even-close-to-high-end PC, this generates one billion digits 
in 22 minutes:

[steve at sylar python]$ time python randdigits.py

real    22m31.205s
user    20m18.546s
sys     0m7.675s
[steve at sylar python]$ ls -l rand_digits.txt
-rw-rw-r-- 1 steve steve 1000000000 2010-07-18 11:00 rand_digits.txt


Having generated the digits, it might be useful to look for deviations 
from randomness. There should be approximately equal numbers of each 
digit (100,000,000 each of 0, 1, 2, ..., 9), of each digraph 
(10,000,000 each of 00, 01, 02, ..., 98, 99), trigraphs (1,000,000 each 
of 000, ..., 999) and so forth.

The interesting question is, if you measure a deviation from the 
equality (and you will), is it statistically significant? If so, it is 
because of a problem with the random number generator, or with my 
algorithm for generating the sample digits?



-- 
Steven D'Aprano


More information about the Tutor mailing list