performance of script to write very long lines of random chars

Wed Apr 10 21:45:31 EDT 2013

On Thu, Apr 11, 2013 at 11:21 AM, gry <georgeryoung at gmail.com> wrote:
> avail_chrs =
> '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&
> \'()*+,-./:;<=>?@[\\]^_`{}'

Is this exact set of characters a requirement? For instance, would it
be acceptable to instead use this set of characters?

avail_chrs = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'

Your alphabet has 92 characters, this one only 64... the advantage is
that it's really easy to work with a 64-character set; in fact, for
this specific set, it's the standard called Base 64, and Python
already has a module for working with it. All you need is a random
stream of eight-bit characters, which can be provided by os.urandom().

So here's a much simpler version of your program, following the
cut-down character set I offer:

import os
import base64
nchars = 32000000
rows = 10
# Note: If nchars is one higher than a multiple of 4 (eg 5, 9, 101),
# the lines will be one character short (4, 8, 100).
nchars = nchars * 3 // 4
for l in range(rows):
    print(base64.b64encode(os.urandom(nchars)).strip(b'='))

If you can guarantee that your nchars will always be a multiple of 4,
you can drop the .strip() call.

This is going to be *immensely* faster than calling random.choice()
for every character, but it depends on a working os.urandom (it'll
raise NotImplementedError if there's no suitable source). I know it's
available on OS/2, Windows, and Linux, but don't have others handy to
test. If by "a bunch of different computers" you mean exclusively
Linux computers, this should be fine.

ChrisA