Unsorting(randomizing) a sequence

Tue Aug 17 15:39:48 EDT 1999

Ole-Kristian Villaboe <olevi at my-deja.com> wrote:
 > 
 >     Ole> I've been writing an algorithm myself, but it's incredibly slow and
 >     Ole> I would like something quicker (of course :) )
 > 

To which Stephan Houben <stephan at pcrm.win.tue.nl> replied:

> 
> def randomize(l):
>     length = len(l)
>     for i in range(length):
>         j = rand.randrange(i, length)
>         l[i], l[j] = l[j], l[i]
> 
> Note that this randomizes the list in-place.

and Skip Montanaro shared:
 > Got this off the list a couple years ago.  Does the trick for me:
 > 
 >     def shuffle(lst):
 > 	"randomize a list"
 > 	for i in range(len(lst)-1,0,-1):
 > 	    choice = int(whrandom.random() * i)
 > 	    lst[choice], lst[i] = lst[i], lst[choice]
 > 
 >     >>> l = range(20)
 >     >>> shuffle(l)
 >     >>> l
 >     [4, 17, 12, 6, 7, 15, 1, 11, 16, 10, 5, 3, 9, 19, 0, 8, 18, 13, 14, 2]
 > 

and Jeremy Hylton chimed in with:

> I've rather fond of the following approach:
>
> import random
> 
> def randomize(a, b):
>    return random.choice([-1, 0, 1])
>
> l = range(2, 20)
> l.sort(randomize)

If you're playing poker with Skip, Jeremy, Stefan, you better let
Stefan shuffle the cards!!  The other two methods posted above are
seriously flawed.

By the way, Skip, I know you didn't write the "shuffle" method, just
passed it along, but in the following I'll call it "Skip's Method"
since I don't know who the original author is.  No criticism implied.

If I may be excused an excursion into group theory, the set of
permutations of a set of N elements forms a group, called the
symmetric group, Sym(N).  The symmetric group has N! elements, and
Ole's question can be interpreted as "How do I pick randomly
distributed elements of Sym(N)".

There is a function defined on Sym(N) called "sign" which is the sign of 
a permutation - any permutation can be undone by a series of "flips"
(e.g. l[i],l[j]=l[j],l[i] where i!=j) and for any permutation, the
number of "flips" required is either odd or even (the number itself is 
not determined by the permutation, because you could always add 2 more 
flips that just undo each other, but the *sign* of the number is).

For a permutation p, define sign(p) as (-1)**number of flips to undo
the permutation.  (This is equal to the determinant of the matrix
representing p, a matrix which only contains 1's and 0's).

Now, the group Sym(N) has an even number of elements and exactly half
have sign 1 (even permutations) and half have sign -1 (odd permutations).

Thus, 

Integral over (p in Sym(N)) of sign(p) == 0

Furthermore, if P acts on the vector [0,1,...,N-1] then we have

Integral over (p in Sym(N)) of p[0] = (N-1)/2 
	(the mean of 0,1,...N-1, since we have taken all permutations)

If we have a truly random way of choosing elements of Sym(N) we can do a
Monte Carlo integral, which should approximate the acutal integral,
which is just a finite sum in this case.

We can use this to test the posted solutions:

#!/usr/bin/env python

import math

#Stefan's method
import random; rand=random
def randomize(l):
    length = len(l)
    for i in range(length):
        j = rand.randrange(i, length)
        l[i], l[j] = l[j], l[i]

#Skip's method
import whrandom
def shuffle(lst):
    for i in range(len(lst)-1,0,-1):
        choice = int(whrandom.random() * i)
        lst[choice], lst[i] = lst[i], lst[choice]

#Jeremy's method        
def jeremy(a, b):
    return random.choice([-1, 0, 1])

def hylton(seq):
    seq.sort(jeremy)

#calculate the sign of the permuted sequence seq
# seq is assumed to be a permutation of the list 0..N-1
def sign(seq):
    sgn=1
    seq=seq[:] #copy the sequence
    for i in range(len(seq)):
        if seq[i]!=i:
            j=seq.index(i)
            sgn = -1*sgn
            seq[i],seq[j]=seq[j],seq[i]
    return sgn

# Take the sequence 0..seqlen-1 and take
# ntries permutations chosen according to method,
# keep track of the means and standard deviations
# of the permuted elements  and also the mean 
# of the sign of the permutation
def means(seqlen, ntries, method):
    mean_sign = 0
    means = [0]*seqlen
    mean_square = [0]*seqlen
    for x in range(ntries):
        seq=range(seqlen) #sorted sequence
        method(seq)
        mean_sign = mean_sign + sign(seq)
        for i in range(seqlen):
            j=seq[i]
            means[i]=means[i]+j
            mean_square[i]=mean_square[i]+j*j
    ntries=float(ntries)
    for i in range(seqlen):
        means[i]=means[i]/ntries
        mean_square[i]=math.sqrt(mean_square[i]/ntries - means[i]*means[i])
    mean_sign=mean_sign/ntries
    return means, mean_square, mean_sign

for method in [randomize, shuffle, hylton]:
    print "Testing", method
    for seqlen in (2,3,4,5,6,7,8):
        mean, rms, mean_sign = means(seqlen, 50000, method)
	print "len=",seqlen
        print " mean values=", mean
        print " rms=", rms
        print " mean sign=",mean_sign

And here's the output:

Testing <function randomize at 811c208>
len= 2
 mean values= [0.50238, 0.49762]
 rms= [0.499994335568, 0.499994335568]
 mean sign= -0.00476
len= 3
 mean values= [1.00074, 0.99718, 1.00208]
 rms= [0.816908472474, 0.81641413976, 0.816159098216]
 mean sign= 0.0064
len= 4
 mean values= [1.4945, 1.49814, 1.49742, 1.50994]
 rms= [1.11800257155, 1.11937328019, 1.11683183318, 1.11786456979]
 mean sign= 8e-05
len= 5
 mean values= [1.98744, 2.00134, 2.0009, 2.00898, 2.00134]
 rms= [1.41024900156, 1.41502586704, 1.4200208414, 1.40725952106, 1.41838577418]
 mean sign= -0.00088
len= 6
 mean values= [2.49602, 2.51312, 2.49238, 2.50368, 2.48962, 2.50518]
 rms= [1.71489479549, 1.70544652968, 1.7037787226, 1.7034043729, 1.7096819165, 1.70960029469]
 mean sign= 0.0006
len= 7
 mean values= [2.9922, 2.99634, 2.99654, 3.00452, 3.01642, 2.99032, 3.00366]
 rms= [1.99693243752, 1.9949753393, 1.99988200362, 2.00073475743, 2.00253598809, 2.00430194771, 2.0005015882]
 mean sign= 0.00636
len= 8
 mean values= [3.4883, 3.50384, 3.5003, 3.50828, 3.49524, 3.50056, 3.49318, 3.5103]
 rms= [2.29326472741, 2.29340472974, 2.28332212138, 2.29005053254, 2.29190255953, 2.2880121692, 2.29673539782, 2.29349818182]
 mean sign= -0.00124

Testing <function shuffle at 811c228>
len= 2
 mean values= [1.0, 0.0]
 rms= [0.0, 0.0]
 mean sign= -1.0
len= 3
 mean values= [1.49934, 1.00132, 0.49934]
 rms= [0.4999995644, 0.9999991288, 0.4999995644]
 mean sign= 1.0
len= 4
 mean values= [1.99896, 1.65842, 1.34248, 1.00014]
 rms= [0.817825726179, 1.24588245979, 1.2511384614, 0.81538946547]
 mean sign= -1.0
len= 5
 mean values= [2.50058, 2.24862, 1.99542, 1.75502, 1.50036]
 rms= [1.11721066214, 1.48050264964, 1.58308528627, 1.47751304549, 1.11744345289]
 mean sign= 1.0
len= 6
 mean values= [3.00468, 2.80428, 2.61678, 2.37518, 2.19412, 2.00496]
 rms= [1.41707377987, 1.71985280812, 1.85314932793, 1.8534562222, 1.71629759238, 1.41553360907]
 mean sign= -1.0
len= 7
 mean values= [3.50838, 3.3154, 3.17066, 3.01972, 2.81974, 2.6719, 2.4942]
 rms= [1.70175491056, 1.97085840181, 2.11436400944, 2.15678722214, 2.11953917926, 1.97356793397, 1.71074438769]
 mean sign= 1.0
len= 8
 mean values= [4.00358, 3.85536, 3.711, 3.55926, 3.42784, 3.29538, 3.13716, 3.01042]
 rms= [1.99722987751, 2.23899961376, 2.36665988262, 2.43745117949, 2.44034279035, 2.37833779258, 2.23953279378, 1.99552284467]
 mean sign= -1.0

Testing <function hylton at 811c268>
len= 2
 mean values= [0.1123, 0.8877]
 rms= [0.315735189676, 0.315735189676]
 mean sign= 0.7754
len= 3
 mean values= [0.22454, 0.99842, 1.77704]
 rms= [0.546773982922, 0.498013557647, 0.497884362478]
 mean sign= 0.62736
len= 4
 mean values= [0.42664, 1.2663, 1.8238, 2.48326]
 rms= [0.901874886223, 0.890788588836, 0.753467690084, 0.739621370973]
 mean sign= 0.40748
len= 5
 mean values= [0.53772, 1.39702, 2.1733, 2.62146, 3.2705]
 rms= [1.08333614432, 1.08555751557, 1.0958955744, 0.991931181282, 0.964764090335]
 mean sign= 0.28956
len= 6
 mean values= [0.66926, 1.58918, 2.65972, 2.66882, 3.29658, 4.11644]
 rms= [1.31366321879, 1.37545880622, 1.51967414981, 1.18339334441, 1.18078800112, 1.13828016165]
 mean sign= 0.15808
len= 7
 mean values= [0.85156, 1.8575, 2.74986, 3.1399, 3.303, 4.0936, 5.00458]
 rms= [1.61084001887, 1.74200853901, 1.70962275968, 1.61907627677, 1.38761341879, 1.38874009087, 1.28955768525]
 mean sign= 0.1106
len= 8
 mean values= [1.064, 2.17124, 2.95968, 3.61052, 3.5423, 4.21624, 4.80906, 5.62696]
 rms= [1.93729295668, 2.13704395425, 2.00799758406, 2.03430217264, 1.72304692623, 1.7540240199, 1.61719569515, 1.44652727537]
 mean sign= 0.07044

The "randomize" function has good behavior - the sign of the
permutations generated averages out to 0, and the means and standard
deviations are close to the values we'd expect if the permutations
mixed up the indexes thoroughly and without bias.  I think this code
is good.

The "shuffle" function generates only odd or even permutations,
depending on the sign of n.  This is because at each step of the
iteration, two elements are *always* exchanged.  The means are
monotonically increasing (the permutation tends to produce increasing
sequences) oand the RMS variation is not equidistributed.  It peaks
(apparently) near the middle.

The "hylton" method favors even permutations over odd ones, tends to
produce increasing sequences, and distributes the variance unequally,
apparently peaking near the beginning (1/e of the way through?).  I
haven't quite fully analyzed the reasons for this yet...

Use-Stefan's-method-ly yr's,

		       Charles