An efficient, pythonic way to calculate result sets

happyhondje at gmail.com happyhondje at gmail.com
Fri Oct 26 09:55:04 EDT 2007


On Oct 26, 2:33 am, Raymond Hettinger <pyt... at rcn.com> wrote:
> On Oct 25, 8:31 am, happyhon... at gmail.com wrote:
>
>
>
> > I've got a little issue, both programming and performance-wise. I have
> > a set, containing objects that refer to other sets. For example, in a
> > simple notation: (<a, b, c>, <d, e>) (or in a more object-like
> > display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN
> > objects in the outer set, and the amount of items in the choices sets
> > could also reach a high number. The objects in the choices sets might
> > overlap.
>
> > Now I want to have all possible combinations, like this: (a, d), (b,
> > d), (c, d), (a, e), (b, e), (c, e).
>
> > However, I've got a few catches that make an already (icky) recursive
> > function worse to use.
>
> > First of all, I want to be able to only choose things so that the
> > outer 'result sets' have the same length. For example, if you'd have
> > (<a, b>, <a, c>), you might pick (a, a) with a simple algorythm, the
> > basic behaviour of sets reducing it to (a) and thus having an improper
> > length. I could add yet another loop after calculating everything to
> > throw out any result sets with the improper length, but that would be
> > highly inefficient.
>
> > Second, I'd hope to be able to say that objX should be assumed to have
> > made the choice z. In the first example I mentioned, if I said that
> > 'obj1 == a', the only result sets that would come out would be (a, d)
> > and (a, e).
>
> def cross_nodups(*args):
>     'Cross product after eliminating repeat elements, keeping constant
> size'
>     ans = [[]]
>     for arg in args:
>         ans = [x+[y] for x in ans for y in arg if y not in x]
>     return ans
>
> def choose_first(obj1, *args):
>     'Assume a choice of a first object'
>     return cross_nodups(obj1, *args[1:])
>
> >>> print cross_nodups('ab', 'acd', 'fg')
>
> [['a', 'c', 'f'], ['a', 'c', 'g'], ['a', 'd', 'f'], ['a', 'd', 'g'],
> ['b', 'a', 'f'], ['b', 'a', 'g'], ['b', 'c', 'f'], ['b', 'c', 'g'],
> ['b', 'd', 'f'], ['b', 'd', 'g']]
>
> >>> print choose_first('a', s1,s2,s3)
>
> [['a', 'c', 'f'], ['a', 'c', 'g'], ['a', 'd', 'f'], ['a', 'd', 'g']]
>
> > (and hopefully not rely on
> > recursion
>
> Easy exercise of transforming recursion to iteration left to the
> reader.
>
> Raymond

Wonderful, thank you Raymond!

I only had one problem with it still (damn I'm picky) and despite
prodding around in that generator expression which absolutely defies
my logic, I've been able to get it working.

In my situation, the choices are always sets rather than strings.
While it shouldn't make a difference (both are iterable), my sets
contain strings.. which end up being cut in pieces and matched
seperately with everything. I've solved this as follows:

def cross_nodups(*args):
    'Cross product after eliminating repeat elements, keeping constant
size'
    ans = [[]]
    for arg in args:
        ans = [x+[y] for x in ans for y in arg if y not in x]
    return ans

def choose_first(obj1, *args):
    'Assume a choice of a first object'
    return cross_nodups(frozenset((obj1,)), *args[1:])

Gives results:

>>> choose_first("H", set(["H", "R"]), set(["H", "R", "K", "S"]), set(["H", "R", "K", "S"]))
set([frozenset(['H', 'S', 'R']), frozenset(['H', 'K', 'R']),
frozenset(['H', 'K', 'S'])])
>>> choose_first("HA", set(["HA", "RA"]), set(["HA", "RA", "KA", "SA"]), set(["HA", "RA", "KA", "SA"]))
set([frozenset(['KA', 'HA', 'RA']), frozenset(['SA', 'HA', 'RA']),
frozenset(['KA', 'HA', 'SA'])])

rather than

>>> choose_first("HA", set(["HA", "RA"]), set(["HA", "RA", "KA", "SA"]), set(["HA", "RA", "KA", "SA"]))
set([frozenset(['A', 'SA', 'HA']), frozenset(['H', 'KA', 'SA']),
frozenset(['H', 'KA', 'RA']), frozenset(['A', 'KA', 'HA']),
frozenset(['H', 'HA', 'RA']), frozenset(['H', 'SA', 'HA']),
frozenset(['A', 'SA', 'RA']), frozenset(['A', 'KA', 'SA']),
frozenset(['A', 'HA', 'RA']), frozenset(['H', 'KA', 'HA']),
frozenset(['H', 'SA', 'RA']), frozenset(['A', 'KA', 'RA'])])

Now, this works great! Although I am still torn apart on a possible
optimization (that I'm not sure is an optimization yet):

- should I use the original cross_nodups() function you thought up and
convert all of its contents to a proper set of sets (a secondary loop
afterwards). This seems more efficient since I would not be doing a
lot of frozenset() and list() conversions while looping.
- however, the list() and frozenset() construct should reduce looping
quite a bit and essentially prevent a lot of iterations that would be
created due to the doubles in the original functions output.

Of course that depends on the contents, so for simplicities sake, say
that the algorythm is to run with 5-10 choices to make, each choice
being out of averaged 6 items that may or may not conflict with other
items.

Again, thank you very much Raymond, your snippet makes my monstrosity
quite ready for /dev/null. :)




More information about the Python-list mailing list