My own accounting python euler problem

Tue Nov 10 08:59:56 EST 2009

On Sun, 08 Nov 2009 12:31:26 -0500, geremy condra wrote:

> What you're describing is the powerset operation. Here's the example
> from the python docs:
[...]
> What I find interesting is that running it through timeit, it is much
> slower than the code suggested by Dan Bishop.

Your test doesn't show what you think it shows. You shouldn't just 
blindly apply timeit without testing to see that the functions return 
what you think they return. Your test is, I'm afraid, totally bogus and 
the conclusion you draw is completely wrong.

[...]
> #timeit.timeit("subsets1(x)", setup) doesn't appear to terminate
> timeit.timeit("subsets2(x)", setup)
> timeit.timeit("subsets3(x)", setup)

For the sake of speed, I've used a smaller x. Here are the results of 
calling the three functions:

>>> x = range(3)
>>> subsets1(x)
[[2], [2, 0], [2, 1], [2, 1, 0], [], [0], [1], [1, 0]]
>>> subsets2(x)
<generator object subsets2 at 0xb7c6311c>
>>> subsets3(x)
<itertools.chain object at 0xb7c608ac>

The reason subsets1() "doesn't appear to terminate" is that you are 
trying to list all the subsets of x = range(100). That's a LOT of 
subsets: 2**100 to be precise, or approximately 1.2e30.

subsets2() and subsets3() return a generator function and an iterator 
object respectively. There may be some overhead difference between those, 
but that's trivial compared to the cost of generating the subsets 
themselves.

A better way of doing the timing is as follows:

from itertools import chain, combinations
from timeit import Timer

# use a number small enough to calculate in a reasonable time
x = list(range(10))

def subsets1(L):
       S = []
       if (len(L) == 1):
               return [L, []]
       else:
               for s in subsets1(L[1:]):
                       S.append(s)
                       S.append(s + [ L[0]])
       return S

def subsets2(L):
   if L:
       for s in subsets2(L[1:]):
           yield s
           yield s + [L[0]]
   else:
       yield []

def subsets3(iterable):
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in 
      range(len(s)+1))

setup = "from __main__ import subsets1, subsets2, subsets3, x"

# Check the three functions return the same results:
x1 = sorted(subsets1(x))
x2 = sorted(subsets2(x))
x3 = sorted(list(t) for t in subsets1(x))
assert x1 == x2 == x3

# Set up some timers.
t1 = Timer("subsets1(x)", setup)
t2 = Timer("list(subsets2(x))", setup)
t3 = Timer("list(subsets3(x))", setup)

# And run them!
for t in (t1, t2, t3):
    print min(t.repeat(number=1000, repeat=5))

The results I get are:

1.19647693634
0.901714801788
0.175387859344

Which as you can see, shows that the recipe in the docs is nearly ten 
times faster than Dan's version. That's not surprising -- the docs 
version uses highly optimized C code from itertools, while Dan's version 
uses slow-ish Python code and recursion.

To show this is no fluke, I increased the size of x and ran the tests 
again:

>>> x = list(range(15))  # 32000+ subsets
>>> x1 = sorted(subsets1(x))
>>> x2 = sorted(subsets2(x))
>>> x3 = sorted(list(t) for t in subsets1(x))
>>> assert x1 == x2 == x3
>>> 
>>> t1 = Timer("subsets1(x)", setup)
>>> t2 = Timer("list(subsets2(x))", setup)
>>> t3 = Timer("list(subsets3(x))", setup)
>>> for t in (t1, t2, t3):
...     print min(t.repeat(number=1000, repeat=5))
...
45.283468008
33.9274909496
7.40781188011

-- 
Steven