Generating a large random string

Peter Otten __peter__ at web.de
Fri Feb 20 06:00:30 EST 2004


Paul Rubin wrote:

> Oops, per other post, it gives strings of bytes and needs filtering.
> The following runs in about 1.2 seconds on my machine, but has an
> small (infinitesimal) chance of failure:
> 
> import string,array,time
> t=time.time()
> ttab = string.letters*4 + '\0'*48
> a = array.array('B', open("/dev/urandom").read(1500000).translate(ttab))
> a = array.array('B', filter(abs,a)).tostring()[:1000000]
> print time.time()-t

from __future__ import division
import array, random, string, sys

identity = string.maketrans("", "")
ld = 256//len(string.letters)
rest = 256 % len(string.letters)
ttab = string.letters*ld + '\0'*rest
dtab = identity[-rest:]

# a fully functional variant of your approach
def randstrUnix(length, extra=1.25):
    a = open("/dev/urandom").read(int(length*extra)).translate(ttab, dtab)
    while len(a) < length:
        a += randstrUnix(length-len(a), 1.3)
    return a[:length]

twoletters = [c+d for c in string.letters for d in string.letters]

# the fastest pure-python version I was able to produce
def randstrPure(length):
    r = random.random
    n = len(twoletters)
    l2 = length//2
    lst = [None] * l2
    for i in xrange(l2):
        lst[i] = twoletters[int(r() * n)]
    if length & 1:
        lst.append(random.choice(string.letters))
    return "".join(lst)

The timings:

$ timeit.py -s"import randchoice as r" "r.randstrUnix(1000000)"
10 loops, best of 3: 2.29e+05 usec per loop
$ timeit.py -s"import randchoice as r" "r.randstrPure(1000000)"
10 loops, best of 3: 6.51e+05 usec per loop

A factor of 3 would hardly justify the OS-dependency in most cases.
Note that using twoletters[int(r() * n)] as seen in Sean Ross' version
instead of random.choice(twoletters) doubled the speed.

Peter



More information about the Python-list mailing list