Creating unique combinations from lists

Tim Chase python.list at tim.thechases.com
Thu Jan 17 10:29:53 EST 2008


>> You can use a recursive generator:
>>
>>    def iterall(*iterables):
>>      if iterables:
>>        for head in iterables[0]:
>>          for remainder in iterall(*iterables[1:]):
>>            yield [head] + remainder
>>      else:
>>        yield []
>>
>>    for thing in iterall(
>>        ['big', 'medium', 'small'],
>>        ['old', 'new'],
>>        ['blue', 'green'],
>>        ):
>>      print thing
>  
> Recursion definitely makes for an elegant solution.  However you do take
> a bit of a performance hit.  If performance matters (and comprehensions
> are supposed to be optimized/fast) and you want a "works for N nested
> loops solution," then you could build a N deep comprehension on the fly
> and eval() it:

Yick...a nice demo of the power of eval, but definitely filed 
under the "Hack" heading :)

I think Martin's solution/recommendation[1] is better 
(readability-wise, and "less apt to shoot yourself in the foot 
with some bogus generated code"-wise) if you don't mind building 
the whole result set in memory which your eval() solution does as 
well.  I'm curious to see the timing results from adding that 
recipe to your test-suite.

The advantage to the recursive-generator solution is that it 
should really only keep your initial input and the current result 
in memory as the generated item, so you can reasonably iterate 
over millions of rows without having gigs of RAM.  I don't 
believe the recursion will go deeper than the number of lists 
you're iterating over, so the stack shouldn't explode.

But as you show, this method comes at a considerable performance 
hit.  I don't know how much is due to recursion overhead, how 
much is due to generator overhead, and how much is due to the 
list-building overhead.  Perhaps a wiser pythoneer than I will 
see something obvious in my code that could be tweaked to reduce 
the performance hit.

Ah well.  Many solutions to the problem, each with advantages and 
disadvantages:

Hard Code the loops (whether using "for" or list-comp):
  Pro:  easy to code for small sets of data
  Pro:  easy to understand
  Pro:  fast
  Pro:  minimal memory usage
  Pro:  readily creatable by the python newbie
  Con:  requires changing code if dimensionality changes
  Con:  doesn't handle an arbitrary number of lists

Use Martin's cookbook solution:
  Pro:  popularly documented in the cookbook
  Pro:  fairly easy to understand
  Pro:  handles arbitrary numbers of lists
  Con:  builds all results in-memory
  Speed: pro or con?
  Notes:  generates in minor-order resolution[2]

Recursive-generator solution:
  Pro:  minimal memory usage
  Pro:  fairly easy to understand
  Con:  notably slower
  Notes:  generates in major-order resolution[2]

Pick whichever meets your needs.

-tkc


[1]
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496807

[2] a term arbitrarily defined/made-up by me as a way to 
distinguish whether the results are grouped from 
left-to-right/first-to-last (major order) or from 
right-to-left/last-to-first (minor order).  I happen to like 
major order, but that may stem from an undiagnosed neurosis ;)





More information about the Python-list mailing list