performance of script to write very long lines of random chars

Chris Angelico rosuav at gmail.com
Wed Apr 10 21:45:31 EDT 2013


On Thu, Apr 11, 2013 at 11:21 AM, gry <georgeryoung at gmail.com> wrote:
> avail_chrs =
> '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&
> \'()*+,-./:;<=>?@[\\]^_`{}'

Is this exact set of characters a requirement? For instance, would it
be acceptable to instead use this set of characters?

avail_chrs = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'

Your alphabet has 92 characters, this one only 64... the advantage is
that it's really easy to work with a 64-character set; in fact, for
this specific set, it's the standard called Base 64, and Python
already has a module for working with it. All you need is a random
stream of eight-bit characters, which can be provided by os.urandom().

So here's a much simpler version of your program, following the
cut-down character set I offer:

import os
import base64
nchars = 32000000
rows = 10
# Note: If nchars is one higher than a multiple of 4 (eg 5, 9, 101),
# the lines will be one character short (4, 8, 100).
nchars = nchars * 3 // 4
for l in range(rows):
    print(base64.b64encode(os.urandom(nchars)).strip(b'='))


If you can guarantee that your nchars will always be a multiple of 4,
you can drop the .strip() call.

This is going to be *immensely* faster than calling random.choice()
for every character, but it depends on a working os.urandom (it'll
raise NotImplementedError if there's no suitable source). I know it's
available on OS/2, Windows, and Linux, but don't have others handy to
test. If by "a bunch of different computers" you mean exclusively
Linux computers, this should be fine.

ChrisA



More information about the Python-list mailing list