itertools: problem with nested groupby, list()

Jon Clements joncle at googlemail.com
Tue May 4 07:52:34 EDT 2010


On 4 May, 12:36, Nico Schlömer <nico.schloe... at gmail.com> wrote:
> > Does this example help at all?
>
> Thanks, that clarified things a lot!
>
> To make it easier, let's just look at 'a' and 'b':
>
> > my_list.sort( key=itemgetter('a','b','c') )
> > for a, a_iter in groupby(my_list, itemgetter('a')):
> >    print 'New A', a
> >    for b, b_iter in groupby(a_iter, itemgetter('b')):
> >        print '\t', 'New B', b
> >        for b_data in b_iter:
> >            print '\t'*3, a, b, b_data
> >        print '\t', 'End B', b
> >    print 'End A', a
>
> That works well, and I can wrap the outer loop in another loop without
> problems. What's *not* working, though, is having more than one pass
> on the inner loop, as in
>
> =============================== *snip* ===============================
> my_list.sort( key=itemgetter('a','b','c') )
> for a, a_iter in groupby(my_list, itemgetter('a')):
>    print 'New A', a
>    for pass in ['first pass', 'second pass']:
>        for b, b_iter in groupby(a_iter, itemgetter('b')):
>            print '\t', 'New B', b
>            for b_data in b_iter:
>                print '\t'*3, a, b, b_data
>            print '\t', 'End B', b
>        print 'End A', a
> =============================== *snap* ===============================
>
> I tried working around this by
>
> =============================== *snip* ===============================
> my_list.sort( key=itemgetter('a','b','c') )
> for a, a_iter in groupby(my_list, itemgetter('a')):
>    print 'New A', a
>    inner_list =  list( groupby(a_iter, itemgetter('b')) )
>    for pass in ['first pass', 'second pass']:
>        for b, b_iter in inner_list:
>            print '\t', 'New B', b
>            for b_data in b_iter:
>                print '\t'*3, a, b, b_data
>            print '\t', 'End B', b
>        print 'End A', a
> =============================== *snap* ===============================
>
> which don't work either, and I don't understand why. -- I'll look at
> Uli's comments.
>
> Cheers,
> Nico
>
> On Tue, May 4, 2010 at 1:08 PM, Jon Clements <jon... at googlemail.com> wrote:
> > On 4 May, 11:10, Nico Schlömer <nico.schloe... at gmail.com> wrote:
> >> Hi,
>
> >> I ran into a bit of an unexpected issue here with itertools, and I
> >> need to say that I discovered itertools only recently, so maybe my way
> >> of approaching the problem is "not what I want to do".
>
> >> Anyway, the problem is the following:
> >> I have a list of dictionaries, something like
>
> >> [ { "a": 1, "b": 1, "c": 3 },
> >>   { "a": 1, "b": 1, "c": 4 },
> >>   ...
> >> ]
>
> >> and I'd like to iterate through all items with, e.g., "a":1. What I do
> >> is sort and then groupby,
>
> >> my_list.sort( key=operator.itemgetter('a') )
> >> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
>
> >> and then just very simply iterate over my_list_grouped,
>
> >> for my_item in my_list_grouped:
> >>     # do something with my_item[0], my_item[1]
>
> >> Now, inside this loop I'd like to again iterate over all items with
> >> the same 'b'-value -- no problem, just do the above inside the loop:
>
> >> for my_item in my_list_grouped:
> >>         # group by keyword "b"
> >>         my_list2 = list( my_item[1] )
> >>         my_list2.sort( key=operator.itemgetter('b') )
> >>         my_list_grouped = itertools.groupby( my_list2,
> >> operator.itemgetter('b') )
> >>         for e in my_list_grouped:
> >>             # do something with e[0], e[1]
>
> >> That seems to work all right.
>
> >> Now, the problem occurs when this all is wrapped into an outer loop, such as
>
> >> for k in [ 'first pass', 'second pass' ]:
> >>     for my_item in my_list_grouped:
> >>     # bla, the above
>
> >> To be able to iterate more than once through my_list_grouped, I have
> >> to convert it into a list first, so outside all loops, I go like
>
> >> my_list.sort( key=operator.itemgetter('a') )
> >> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
> >> my_list_grouped = list( my_list_grouped )
>
> >> This, however, makes it impossible to do the inner sort and
> >> groupby-operation; you just get the very first element, and that's it.
>
> >> An example file is attached.
>
> >> Hints, anyone?
>
> >> Cheers,
> >> Nico
>
> > Does this example help at all?
>
> > my_list.sort( key=itemgetter('a','b','c') )
> > for a, a_iter in groupby(my_list, itemgetter('a')):
> >    print 'New A', a
> >    for b, b_iter in groupby(a_iter, itemgetter('b')):
> >        print '\t', 'New B', b
> >        for c, c_iter in groupby(b_iter, itemgetter('c')):
> >            print '\t'*2, 'New C', c
> >            for c_data in c_iter:
> >                print '\t'*3, a, b, c, c_data
> >            print '\t'*2, 'End C', c
> >        print '\t', 'End B', b
> >    print 'End A', a
>
> > Jon.
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
>

Are you basically after this, then?

for a, a_iter in groupby(my_list, itemgetter('a')):
    print 'New A', a
    for b, b_iter in groupby(a_iter, itemgetter('b')):
        b_list = list(b_iter)
        for p in ['first', 'second']:
            for b_data in b_list:
                #whatever...

Cos that looks like it could be simplified to (untested)

for (a, b), data_iter in groupby(my_list, itemgetter('a','b')):
   data = list(data) # take copy
   for pass_ in ['first', 'second']:
      # do something with data

But from my POV, it's almost looking like a 2-tuple key in a
defaultdict jobby.

Jon.



More information about the Python-list mailing list