Creating unique combinations from lists
Tim Chase
python.list at tim.thechases.com
Thu Jan 17 10:29:53 EST 2008
>> You can use a recursive generator:
>>
>> def iterall(*iterables):
>> if iterables:
>> for head in iterables[0]:
>> for remainder in iterall(*iterables[1:]):
>> yield [head] + remainder
>> else:
>> yield []
>>
>> for thing in iterall(
>> ['big', 'medium', 'small'],
>> ['old', 'new'],
>> ['blue', 'green'],
>> ):
>> print thing
>
> Recursion definitely makes for an elegant solution. However you do take
> a bit of a performance hit. If performance matters (and comprehensions
> are supposed to be optimized/fast) and you want a "works for N nested
> loops solution," then you could build a N deep comprehension on the fly
> and eval() it:
Yick...a nice demo of the power of eval, but definitely filed
under the "Hack" heading :)
I think Martin's solution/recommendation[1] is better
(readability-wise, and "less apt to shoot yourself in the foot
with some bogus generated code"-wise) if you don't mind building
the whole result set in memory which your eval() solution does as
well. I'm curious to see the timing results from adding that
recipe to your test-suite.
The advantage to the recursive-generator solution is that it
should really only keep your initial input and the current result
in memory as the generated item, so you can reasonably iterate
over millions of rows without having gigs of RAM. I don't
believe the recursion will go deeper than the number of lists
you're iterating over, so the stack shouldn't explode.
But as you show, this method comes at a considerable performance
hit. I don't know how much is due to recursion overhead, how
much is due to generator overhead, and how much is due to the
list-building overhead. Perhaps a wiser pythoneer than I will
see something obvious in my code that could be tweaked to reduce
the performance hit.
Ah well. Many solutions to the problem, each with advantages and
disadvantages:
Hard Code the loops (whether using "for" or list-comp):
Pro: easy to code for small sets of data
Pro: easy to understand
Pro: fast
Pro: minimal memory usage
Pro: readily creatable by the python newbie
Con: requires changing code if dimensionality changes
Con: doesn't handle an arbitrary number of lists
Use Martin's cookbook solution:
Pro: popularly documented in the cookbook
Pro: fairly easy to understand
Pro: handles arbitrary numbers of lists
Con: builds all results in-memory
Speed: pro or con?
Notes: generates in minor-order resolution[2]
Recursive-generator solution:
Pro: minimal memory usage
Pro: fairly easy to understand
Con: notably slower
Notes: generates in major-order resolution[2]
Pick whichever meets your needs.
-tkc
[1]
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/496807
[2] a term arbitrarily defined/made-up by me as a way to
distinguish whether the results are grouped from
left-to-right/first-to-last (major order) or from
right-to-left/last-to-first (minor order). I happen to like
major order, but that may stem from an undiagnosed neurosis ;)
More information about the Python-list
mailing list