Three dumb questions (ordered by dumbness descending)

Alex Martelli aleax at aleax.it
Tue Sep 24 08:39:32 EDT 2002


Thorsten Kampe wrote:

> Okay, here they are:
> 
> 1. Why is 'zip(zip(x)) != x' (should be the same because it's the
> transposed)

I think this has been answered at length (you need the * special form
for the arguments -- and x to be of the right kind -- and then the
equality will hold).


> 2. Peter Norvig mentions in "Python for Lisp Programmers" some
> "don'ts": "[x] + y" and "x[1:]". Are there more things to avoid
> (especially in a loop)?

Strictly a performance issue: these operations are O(N) in len(y)
and len(x) respectively, and Peter is writing for an audience used
to similar operations (cons and cdr respectively) being O(1).  There
is no special reason to avoid such constructs if they best express
what you really want to do.

The only Python performance pitfall that's REALLY easy to fall into:

    hugestring = ''
    for piece in alotoftinypieces:
        hugestring += piece

Now THIS is guaranteed to kill your program's performance.  Use:

    hugestring = ''.join(alotoftinypieces)

and enjoy roughly O(N) performance rather than roughly O(N squared).

More generally, I think that putting strings together with + or +=
IS something "to avoid (especially in a loop)" -- if you don't
already have a sequence of all the tiny pieces, build one up (e.g.
by appending the pieces to an originally empty list) and finally
''.join it.  Strings are immutable, so "hugestring += piece"
forces Python to free the space it originally held for hugestring
and allocate a new one.  Lists are mutable, and in an amortized
sense there's "always" a free slot at the end, so templist.append
is O(1) [amortized].


> 3. random.shuffle: the documentation says: "Note that for even rather
> small len(x), the total number of permutations of x is larger than the
> period of most random number generators; this implies that most
> permutations of a long sequence can never be generated." I translate
> this to "don't shuffle lists with more than 15 elements if you want a
> randomly ordered list".
> 
> Alex Martelli says the same: "For example, Python users who are
> calling random.shuffle on a sequence of substantial length have a need
> for an underlying random generator with a *HUGE* period, as a
> necessary although not sufficient condition towards ensuring that all
> permutations of the shuffled sequence get generated with equal
> likelihood.
> 
> Wichman-Hill has a period of less than 7,000 billions (7e12),
> insufficient to account even for the permutations of a sequence
> whose length is just 16 (16! > 2e13)."
> 
> Otherwise in the "Python Cookbook" (Selecting Random Elements from a
> List Without Repetition) the shuffle version is called "the winner".
> 
> So: can I use random.shuffle or do I have to write my own - slower -
> version to get a random list?

For most practical purposes, you can use random.shuffle.  If you
must guarantee that *all* permutations are equally likely, then
your first issue is ensuring enough randomness -- a generator with
a period of over 10**68 or so to permute 52 objects, for example --
and that's not at all a sufficient condition, either.  If you really
labor under such strict conditions, maybe it's better to use a
"truly random" (hardware) generator rather than a pseudo-random one.

Once you do have a random or pseudo-random generator that fully
satisfies you, Ian Bicking's snippet reimplementing shuffle as a
Decorate-Sort-Undecorate idiom is truly excellent -- a little jewel,
IMHO, despite the little typo in it.  It does, alas, end up a
little slower than Tim Peters' original -- but losing a speed
competition to Tim is no shame, and Ian's version only loses very
very little indeed.  To wit (having corrected Ian's typo):


import time, sys, operator
from random import random

def shuffle_org(x):
    for i in xrange(len(x)-1, 0, -1):
        # pick an element in x[:i+1] with which to exchange x[i]
        j = int(random() * (i+1))
        x[i], x[j] = x[j], x[i]

def shuffle_ian(x):
    newList = [(random(), i) for i in x]
    newList.sort()
    x[:] = [i[1] for i in newList]


biggie = range(200)

def timit(func, N=1000):
    repeat = [None] * N
    start = time.clock()
    for x in repeat: func(biggie)
    stend = time.clock()
    return "%2.2f %s" % (stend-start, func.__name__)

print sys.version
for i in range(2):
    for func in shuffle_org, shuffle_ian:
        print timit(func)


[alex at lancelot linud]$ python -O timo.py
2.2.1 (#2, Jul 15 2002, 17:32:26)
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)]
1.59 shuffle_org
1.87 shuffle_ian
1.60 shuffle_org
1.86 shuffle_ian
[alex at lancelot linud]$



Alex




More information about the Python-list mailing list