An efficient, pythonic way to calculate result sets

happyhondje at gmail.com happyhondje at gmail.com
Thu Oct 25 18:33:40 EDT 2007


On Oct 25, 8:44 pm, "mensana... at aol.com" <mensana... at aol.com> wrote:
> On Oct 25, 10:31 am, happyhon... at gmail.com wrote:
>
>
>
> > Hello everyone,
>
> > I've got a little issue, both programming and performance-wise. I have
> > a set, containing objects that refer to other sets. For example, in a
> > simple notation: (<a, b, c>, <d, e>) (or in a more object-like
> > display: set(obj1.choices=set(a, b, c) ). There may be obj1..objN
> > objects in the outer set, and the amount of items in the choices sets
> > could also reach a high number. The objects in the choices sets might
> > overlap.
>
> > Now I want to have all possible combinations, like this: (a, d), (b,
> > d), (c, d), (a, e), (b, e), (c, e).
>
> > However, I've got a few catches that make an already (icky) recursive
> > function worse to use.
>
> > First of all, I want to be able to only choose things so that the
> > outer 'result sets' have the same length. For example, if you'd have
> > (<a, b>, <a, c>), you might pick (a, a) with a simple algorythm, the
> > basic behaviour of sets reducing it to (a) and thus having an improper
> > length. I could add yet another loop after calculating everything to
> > throw out any result sets with the improper length, but that would be
> > highly inefficient.
>
> Does this do what you want?
>
> result_set = set([])
> seta = set(['a','b','c','d','e'])
> setb = set(['a','c','f','g','h'])
> for i in seta:
>   temp1 = setb.difference(set([i]))
>   for j in temp1:
>     result_set.add(tuple(set([i,j])))
> for i in result_set:
>   print i
>
> I figure there should be 4+5+3+5+5 results.
> No ('a'), no ('c'). Has ('a','c') but not ('c','a').
>
> ##  ('c', 'g')
> ##  ('a', 'd')
> ##  ('h', 'e')
> ##  ('a', 'b')
> ##  ('c', 'f')
> ##  ('e', 'g')
> ##  ('c', 'b')
> ##  ('d', 'f')
> ##  ('a', 'g')
> ##  ('a', 'h')
> ##  ('c', 'e')
> ##  ('e', 'f')
> ##  ('d', 'g')
> ##  ('h', 'b')
> ##  ('a', 'f')
> ##  ('b', 'f')
> ##  ('c', 'd')
> ##  ('h', 'c')
> ##  ('a', 'c')
> ##  ('b', 'g')
> ##  ('a', 'e')
> ##  ('h', 'd')
>
>
>
> > Second, I'd hope to be able to say that objX should be assumed to have
> > made the choice z. In the first example I mentioned, if I said that
> > 'obj1 == a', the only result sets that would come out would be (a, d)
> > and (a, e).
>
> Like this?
>
> result_set = set([])
> seta = set(['a','b','c','d','e'])
> setb = set(['a','c','f','g','h'])
> target = 'a'
> for i in seta:
>   temp1 = setb.difference(set([i]))
>   for j in temp1:
>     temp2 = set([i,j])
>     if target in temp2:
>       result_set.add(tuple(temp2))
> for i in result_set:
>   print i
>
> ##  ('a', 'f')
> ##  ('a', 'g')
> ##  ('a', 'd')
> ##  ('a', 'e')
> ##  ('a', 'h')
> ##  ('a', 'b')
> ##  ('a', 'c')
>
>
>
> > I've been toying with this problem for a while, but I've found out it
> > quickly gets slow, so I hope some people here could find a way to
> > write code like this that is efficient (and hopefully not rely on
> > recursion and 'fix up' loops like I've got mockups with right now).
>
> > Thank you for any suggestions you can offer.

Almost, except for the one big issue that is missing: scalability. I
don't have a seta and setb, but rather set1..setN, with N varying
while the script is running. That's pretty much where the pain comes
in. I have some very (butt-ugly) code that works right now, but it's
more a hack than pretty (or efficient) code.

Thank you for your effort though, since your usage of set.difference()
gave me an idea on another issue I was wrestling with.




More information about the Python-list mailing list