A way to re-organize a list

beginner zyzhu2000 at gmail.com
Wed Aug 1 18:47:07 EDT 2007


On Aug 1, 1:31 am, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
> On Wed, 01 Aug 2007 01:33:49 +0000, beginner wrote:
> > On Jul 19, 10:05 am, beginner <zyzhu2... at gmail.com> wrote:
> >> Hi Everyone,
>
> >> I have a simple list reconstruction problem, but I don't really know
> >> how to do it.
>
> >> I have a list that looks like this:
>
> >> l=[ ("A", "a", 1), ("A", "a", 2), ("A", "a", 3), ("A", "b", 1), ("A",
> >> "b", 2), ("B", "a", 1), ("B", "b", 1)]
>
> >> What I want to do is to reorganize it in groups, first by the middle
> >> element of the tuple, and then by the first element. I'd like the
> >> output look like this:
>
> >> out=[
> >>    [    #group by first element "A"
> >>           [("A", "a", 1), ("A", "a", 2), ("A", "a", 3)], #group by
> >> second element "a"
> >>           [ ("A", "b", 1), ("A", "b", 2)], #group by second element
> >> "b"
> >>    ],
> >>    [   #group by first element "B"
> >>           [("B", "a", 1)],
> >>           [("B", "b", 1)]
> >>    ]
> >> ]
>
> >> All the solutions I came up with are difficult to read and even harder
> >> to go back and change, and I just feel are too complicated for such a
> >> simple problem. I am wondering if anyone here has some insight.
>
> >> If there is a 'functional' way to do this, it would be even greater.
>
> >> Thanks,
> >> Geoffrey
>
> > I guess I still don't quite get functional programming. Here is my
> > imperitive way to do it in O(n).
>
> I didn't try to follow it but are you sure about O(n)?  With the inner
> loop going from 0 to `level` it looks suspiciously quadratic to me.  You
> really should try to grasp the `itertools.groupby()` solution.
>
> And the result of that script looks strange:
>
> [[[[1, 2, 3, 4], [1, 2, 4, 5], [1, 2, 'A', 'B']]], [[[2, 2, 'A', 'C']]]]
>
> Aren't the two top level elements wrapped in one list too much?
> Furthermore the test data doesn't contain an example where the first item
> is the same but the second item is different.
>
> > def group_items(source_list, f_list, target=[]):
>
> Default arguments are evaluated *once* when the ``def`` is executed.  So
> all calls to this function share the same list object bound to `target`.
> Call this function twice without an explicit `target` and you'll see the
> problem.
>
> Ciao,
>         Marc 'BlackJack' Rintsch- Hide quoted text -
>
> - Show quoted text -


Wow. Thanks for the life-saving response. It uncovers a serious bug
about the default argument! Thank you Marc!

It is actually O(n) because it scans the list of records once and
exactly once. The inner loop is trying to make multi-level grouping.
In f_list, you define each level of grouping. For example,
f_list=[f,g] in the above code means 1) you want to put records into
sub-lists so that within each sub-list, the first element of the
records are the same. then 2) within each group, you want to further
group them by the second element of the records. That is why the
results look like the following:

[
   [ #first level grouping
      [[1, 2, 3, 4], [1, 2, 4, 5], [1, 2, 'A', 'B']] #second level
grouping
   ],
   [ #first level grouping
      [[2, 2, 'A', 'C']] #second level grouping
   ]
]

In the example, the inner loop is executed twice. So it loops a total
of 2*n times and it is O(n). If you want to make m level grouping, the
execution time would be O(m*n).

I looked at itertools. It is really cool. But if I just want to make
multiple nested groups, reusing the above function is going to take
less code than using itertools.groupby every time. Also, in
itertools.groupby, I think you have to scan the list of records
multiple times to make nested groups. Although it is still O(m*n),
each operation of groupby is going to be slower.

The one thing I still haven't figured out, is to make nested
generators of multiple nested grouping, so that I can use a multi-loop
to traverse them. A loop like below:

for i in big_group:
  for j in i:
    for k in j:
      .....

i, j, k are generators.

I think short of changing my function to be recursive and the outer
function 'yield' to the inner function, which calls yield to make an
inner generator, it will be difficult to do.

Thanks again for pointing out the default value problem.

Geoffrey




More information about the Python-list mailing list