Pulling all n-sized combinations from a list

mensanator at aol.com mensanator at aol.com
Thu Feb 9 18:25:18 EST 2006


Jack Diederich wrote:
> On Thu, Feb 09, 2006 at 10:23:12AM -0800, mensanator at aol.com wrote:
> > Jack Diederich wrote:
> > > On Wed, Feb 08, 2006 at 12:50:19PM -0800, Swroteb wrote:
> > > > Hi there,
> > > >
> > > > I've got a reasonably sized list of objects that I'd like to pull out
> > > > all combinations of five elements from.  Right now I have a way to do
> > > > this that's quite slow, but manageable.  I know there must be a better
> > > > way to do this, but I'm not sure what it is.  Here's what I've got so
> > > > far:
> > > >
> > >
> > > import probstat # http://probstat.sourceforge.net
> > > for (a, b, c) in probstat.Combination(range(9), 3): # 0..9, pick 3
> > >   print a, b, c
> > >
> > > It is a C extension that does permutations & combinations and is
> > > about 10x faster than doing it in pure python [I'm the author].
> > > It is also the 3rd result for "python combination" and 5th for
> > > "python permutaiton" but every month someone posts to c.l.py asking
> > > for something similar.  Go figure.
> >
> > Did you ever figure that some people use Windows?
>
> I don't have a windows dev box so I can't vouch for this binary but
> a user sent me a windows .pyd yesterday.
>
> http://jackdied.com/static/probstat.pyd
>
> -jackdied


Hey, thanks!

I have a pure Python permutation generator that generates
a Cartesian product of a string with itself with filters
to emulate

permutation w/  replacemment
combination w/  replacemment
permutation w/o replacemment
combination w/o replacemment

so I put together a little test program to see how probstat
compares. Correspondence to my emulations is (as far as I can tell)

permutation w/  replacemment: equivalent to probstat.Cartesian
combination w/  replacemment: no equivalent
permutation w/o replacemment: equivalent to probstat.Permutation
combination w/o replacemment: equivalent to probstat.Combination

Here's the test program:

-----------------------------------------------------

import probstat
import time

def cxxx(m):
    return '(' + ' and '.join(['(i%s!=i%s)' % (m,i) for i in range(m)])
+ ')'

def ooloop5(a, n, perm=True, repl=True):
        if (not repl) and (n>len(a)): return
        p = []
	loop = '\n'.join([(' ' * i) + 'for i%s in a:' % i for i in range(n)])
+ '\n'
	indt = ' ' * n
	sub1 = indt + 'if perm and repl:\n'
	sub2 = indt + 'if (not perm) and repl:\n'
	ccc2 = ' and '.join(['(i%s>=i%s)' % (i,i-1) for i in range(1,n)])
	con2 = indt + ' if ' + ccc2 + ':\n'
	sub3 = indt + 'if perm and (not repl):\n'
	cccx = ' and '.join([cxxx(m) for m in range(1,n)])
	con3 = indt + ' if ' + cccx + ':\n'
	sub4 = indt + 'if (not perm) and (not repl):\n'
	ccc4 = ' and '.join(['(i%s>i%s)' % (i,i-1) for i in range(1,n)])
	con4 = indt + ' if ' + ccc4 + ':\n'
	bod1 = indt + ' s = ' + '+'.join(['i%s' % i for i in range(n)]) + '\n'
	bod2 = indt + ' p.append(s)\n'
	bod3 = indt + '  s = ' + '+'.join(['i%s' % i for i in range(n)]) +
'\n'
	bod4 = indt + '  p.append(s)\n'
	e = loop + sub1 + bod1 + bod2 + sub2 + con2 + bod3 + bod4 + sub3 +
con3 + bod3 + bod4 + sub4 + con4 + bod3 + bod4
	exec e
	return p

def p_cart(a,n):
        A = list(a)
        c = probstat.Cartesian([A for i in range(n)])
        p = []
        for i in c:
            p.append(''.join(i))
        return p

def p_perm(a,n):
        A = list(a)
        t0 = time.time()
        if len(A)<n:
            c = probstat.Permutation(A,n)
        else:
            c = probstat.Permutation(A)
        print time.time()-t0
        p = []
        for i in c:
            p.append(''.join(i))
        return p

def p_comb(a,n):
        A = list(a)
        c = probstat.Combination(A,n)
        p = []
        for i in c:
            p.append(''.join(i))
        return p

print 'permutation w/  replacemment'
p = ooloop5("abc", 3, True, True)
print p
print

print 'combination w/  replacemment'
p = ooloop5("abc", 3, False, True)
print p
print

print 'permutation w/o replacemment'
p = ooloop5("abc", 3, True, False)
print p
print

print 'combination w/o replacemment'
p = ooloop5("abc", 3, False, False)
print p
print


print 'probstat.Cartesian'
p = p_cart('abc',3)
print p
print

print 'probstat.Permutation'
p = p_perm('abc',3)
print p
print

print 'probstat.Combination'
p = p_comb('abc',3)
print p
print

print 'permutation w/  replacemment "abcdefghijklmnopqrstuvwxyz":4'
t0 = time.time()
p = ooloop5("abcdefghijklmnopqrstuvwxyz", 4, True, True)
t1 = time.time()
print len(p),'4-letter words',t1-t0,'seconds'

print 'combination w/  replacemment "abcdefghijklmnopqrstuvwxyz":4'
t0 = time.time()
p = ooloop5("abcdefghijklmnopqrstuvwxyz", 4, False, True)
t1 = time.time()
print len(p),'4-letter words',t1-t0,'seconds'

print 'permutation w/o replacemment "abcdefghijklmnopqrstuvwxyz":4'
t0 = time.time()
p = ooloop5("abcdefghijklmnopqrstuvwxyz", 4, True, False)
t1 = time.time()
print len(p),'4-letter words',t1-t0,'seconds'

print 'combination w/o replacemment "abcdefghijklmnopqrstuvwxyz":4'
t0 = time.time()
p = ooloop5("abcdefghijklmnopqrstuvwxyz", 4, False, False)
t1 = time.time()
print len(p),'4-letter words',t1-t0,'seconds'

print

print 'probstat.Cartesian "abcdefghijklmnopqrstuvwxyz":4'
t0 = time.time()
p = p_cart('abcdefghijklmnopqrstuvwxyz',4)
t1 = time.time()
print len(p),'4-letter words',t1-t0,'seconds'

print 'probstat.Permutation "abcdefghijklmnopqrstuvwxyz":4'
t0 = time.time()
p = p_perm('abcdefghijklmnopqrstuvwxyz',4)
t1 = time.time()
print len(p),'4-letter words',t1-t0,'seconds'

print 'probstat.Combination "abcdefghijklmnopqrstuvwxyz":4'
t0 = time.time()
p = p_comb('abcdefghijklmnopqrstuvwxyz',4)
t1 = time.time()
print len(p),'4-letter words',t1-t0,'seconds'

-----------------------------------------------------

The short tests worked out fine

permutation w/  replacemment
['aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca', 'acb', 'acc', 'baa',
'bab', 'bac', 'bba', 'bbb', 'bbc', 'bca', 'bcb', 'bcc', 'caa', 'cab',
'cac', 'cba', 'cbb', 'cbc', 'cca', 'ccb', 'ccc']

combination w/  replacemment
['aaa', 'aab', 'aac', 'abb', 'abc', 'acc', 'bbb', 'bbc', 'bcc', 'ccc']

permutation w/o replacemment
['abc', 'acb', 'bac', 'bca', 'cab', 'cba']

combination w/o replacemment
['abc']

probstat.Cartesian
['aaa', 'baa', 'caa', 'aba', 'bba', 'cba', 'aca', 'bca', 'cca', 'aab',
'bab', 'cab', 'abb', 'bbb', 'cbb', 'acb', 'bcb', 'ccb', 'aac', 'bac',
'cac', 'abc', 'bbc', 'cbc', 'acc', 'bcc', 'ccc']

probstat.Permutation
['abc', 'acb', 'bac', 'bca', 'cab', 'cba']

probstat.Combination
['abc']

-----------------------------------------------------

Unfortunately, the long test died (out of virtual memory) executing
the probstat.Permution test.

But here's the strange thing. When executed from the Idle prompt,
it works fine.

>>> import probstat
>>> A = list('abcdefghijklmnopqrstuvwxyz')
>>> c = probstat.Permutation(A,4)
>>> print len(c)
358800
>>> for i in c[:10]:print i

['a', 'b', 'c', 'd']
['a', 'b', 'd', 'c']
['a', 'c', 'b', 'd']
['a', 'c', 'd', 'b']
['a', 'd', 'b', 'c']
['a', 'd', 'c', 'b']
['b', 'a', 'c', 'd']
['b', 'a', 'd', 'c']
['b', 'c', 'a', 'd']
['b', 'c', 'd', 'a']
>>> p = []
>>> for i in c: p.append(''.join(i))
>>>
>>> for i in p[:10]:print i

abcd
abdc
acbd
acdb
adbc
adcb
bacd
badc
bcad
bcda

-----------------------------------------------------
And if I comment out the Permutation test, everything else
works ok.

permutation w/  replacemment "abcdefghijklmnopqrstuvwxyz":4
456976 4-letter words 1.23500013351 seconds
combination w/  replacemment "abcdefghijklmnopqrstuvwxyz":4
23751 4-letter words 0.68799996376 seconds
permutation w/o replacemment "abcdefghijklmnopqrstuvwxyz":4
358800 4-letter words 1.67199993134 seconds
combination w/o replacemment "abcdefghijklmnopqrstuvwxyz":4
14950 4-letter words 0.59299993515 seconds

probstat.Cartesian "abcdefghijklmnopqrstuvwxyz":4
456976 4-letter words 1.42200016975 seconds
probstat.Combination "abcdefghijklmnopqrstuvwxyz":4
14950 4-letter words 0.0780000686646 seconds

-----------------------------------------------------

But it fails if I call it from inside a function.

>>> import probstat
>>> import time
>>> def p_perm(a,n):
        A = list(a)
        if len(A)<n:
            c = probstat.Permutation(A,n)
        else:
            c = probstat.Permutation(A)
        p = []
        for i in c:
            p.append(''.join(i))
        return p

>>> p = p_perm('abcdefghijklmnopqrstuvwxyz',4)

Crash! Out of virtual memory.

-----------------------------------------------------

I noticed that the probstat call takes virtually no time, most of it
taken by the p.append loop.

>>> def p_perm(a,n):
        A = list(a)
        if len(A)<n:
            c = probstat.Permutation(A,n)
        else:
            c = probstat.Permutation(A)
        print 'probstat finished'
        p = []
        for i in c:
            p.append(''.join(i))
        return p

>>> p = p_perm('abcdefghijklmnopqrstuvwxyz',4)
probstat finished

Aha! Must be dying during the p.append loop.

-----------------------------------------------------

So we'll skip the append for now.

def p_perm(a,n):
        A = list(a)
        t0 = time.time()
        if len(A)<n:
            q = probstat.Permutation(A,n)
        else:
            q = probstat.Permutation(A)
        print time.time()-t0
        #p = []
        #for i in c:
        #    p.append(''.join(i))
        #return p
        print len(q)
        for i in q[:10]: print i
        return q



probstat.Permutation "abcdefghijklmnopqrstuvwxyz":4
0.0
-1853882368
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'z', 'y']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'y', 'x', 'z']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'y', 'z', 'x']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'z', 'x', 'y']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'z', 'y', 'x']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'x', 'w', 'y', 'z']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'x', 'w', 'z', 'y']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'x', 'y', 'w', 'z']
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'x', 'y', 'z', 'w']
-1853882368 4-letter words 0.25 seconds

HUH??? No wonder the append loop has convulsions!

and why are the lists length 26? s/b len 4.


And yet...

>>> q = probstat.Permutation(list('abcdefghijklmnopqrstuvwxyz'),4)
>>> print len(q)
358800
>>> for i in q[:10]:print i

['a', 'b', 'c', 'd']
['a', 'b', 'd', 'c']
['a', 'c', 'b', 'd']
['a', 'c', 'd', 'b']
['a', 'd', 'b', 'c']
['a', 'd', 'c', 'b']
['b', 'a', 'c', 'd']
['b', 'a', 'd', 'c']
['b', 'c', 'a', 'd']
['b', 'c', 'd', 'a']

-----------------------------------------------------

That's one of the strangest problems I've ever seen.

I don't know if this has anything to do with the pyd file,
but I figured you would like to know.




More information about the Python-list mailing list