itertools: problem with nested groupby, list()

Nico Schlömer nico.schloemer at gmail.com
Tue May 4 08:37:17 EDT 2010


> Are you basically after this, then?
>
> for a, a_iter in groupby(my_list, itemgetter('a')):
>    print 'New A', a
>    for b, b_iter in groupby(a_iter, itemgetter('b')):
>        b_list = list(b_iter)
>        for p in ['first', 'second']:
>            for b_data in b_list:
>                #whatever...

Yes. Moving the 'first', 'second' operation to the innermost loop
works all right, and I guess that's what I'll do.

> Cos that looks like it could be simplified to (untested)
> for (a, b), data_iter in groupby(my_list, itemgetter('a','b')):
>   data = list(data) # take copy
>   for pass_ in ['first', 'second']:
>      # do something with data

Potentially yes, but for now I actually need to do something at "print
'New A', a", so I can't just skip this.

Anyway, the above suggestion works well for now. Thanks!

--Nico






On Tue, May 4, 2010 at 1:52 PM, Jon Clements <joncle at googlemail.com> wrote:
> On 4 May, 12:36, Nico Schlömer <nico.schloe... at gmail.com> wrote:
>> > Does this example help at all?
>>
>> Thanks, that clarified things a lot!
>>
>> To make it easier, let's just look at 'a' and 'b':
>>
>> > my_list.sort( key=itemgetter('a','b','c') )
>> > for a, a_iter in groupby(my_list, itemgetter('a')):
>> >    print 'New A', a
>> >    for b, b_iter in groupby(a_iter, itemgetter('b')):
>> >        print '\t', 'New B', b
>> >        for b_data in b_iter:
>> >            print '\t'*3, a, b, b_data
>> >        print '\t', 'End B', b
>> >    print 'End A', a
>>
>> That works well, and I can wrap the outer loop in another loop without
>> problems. What's *not* working, though, is having more than one pass
>> on the inner loop, as in
>>
>> =============================== *snip* ===============================
>> my_list.sort( key=itemgetter('a','b','c') )
>> for a, a_iter in groupby(my_list, itemgetter('a')):
>>    print 'New A', a
>>    for pass in ['first pass', 'second pass']:
>>        for b, b_iter in groupby(a_iter, itemgetter('b')):
>>            print '\t', 'New B', b
>>            for b_data in b_iter:
>>                print '\t'*3, a, b, b_data
>>            print '\t', 'End B', b
>>        print 'End A', a
>> =============================== *snap* ===============================
>>
>> I tried working around this by
>>
>> =============================== *snip* ===============================
>> my_list.sort( key=itemgetter('a','b','c') )
>> for a, a_iter in groupby(my_list, itemgetter('a')):
>>    print 'New A', a
>>    inner_list =  list( groupby(a_iter, itemgetter('b')) )
>>    for pass in ['first pass', 'second pass']:
>>        for b, b_iter in inner_list:
>>            print '\t', 'New B', b
>>            for b_data in b_iter:
>>                print '\t'*3, a, b, b_data
>>            print '\t', 'End B', b
>>        print 'End A', a
>> =============================== *snap* ===============================
>>
>> which don't work either, and I don't understand why. -- I'll look at
>> Uli's comments.
>>
>> Cheers,
>> Nico
>>
>> On Tue, May 4, 2010 at 1:08 PM, Jon Clements <jon... at googlemail.com> wrote:
>> > On 4 May, 11:10, Nico Schlömer <nico.schloe... at gmail.com> wrote:
>> >> Hi,
>>
>> >> I ran into a bit of an unexpected issue here with itertools, and I
>> >> need to say that I discovered itertools only recently, so maybe my way
>> >> of approaching the problem is "not what I want to do".
>>
>> >> Anyway, the problem is the following:
>> >> I have a list of dictionaries, something like
>>
>> >> [ { "a": 1, "b": 1, "c": 3 },
>> >>   { "a": 1, "b": 1, "c": 4 },
>> >>   ...
>> >> ]
>>
>> >> and I'd like to iterate through all items with, e.g., "a":1. What I do
>> >> is sort and then groupby,
>>
>> >> my_list.sort( key=operator.itemgetter('a') )
>> >> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
>>
>> >> and then just very simply iterate over my_list_grouped,
>>
>> >> for my_item in my_list_grouped:
>> >>     # do something with my_item[0], my_item[1]
>>
>> >> Now, inside this loop I'd like to again iterate over all items with
>> >> the same 'b'-value -- no problem, just do the above inside the loop:
>>
>> >> for my_item in my_list_grouped:
>> >>         # group by keyword "b"
>> >>         my_list2 = list( my_item[1] )
>> >>         my_list2.sort( key=operator.itemgetter('b') )
>> >>         my_list_grouped = itertools.groupby( my_list2,
>> >> operator.itemgetter('b') )
>> >>         for e in my_list_grouped:
>> >>             # do something with e[0], e[1]
>>
>> >> That seems to work all right.
>>
>> >> Now, the problem occurs when this all is wrapped into an outer loop, such as
>>
>> >> for k in [ 'first pass', 'second pass' ]:
>> >>     for my_item in my_list_grouped:
>> >>     # bla, the above
>>
>> >> To be able to iterate more than once through my_list_grouped, I have
>> >> to convert it into a list first, so outside all loops, I go like
>>
>> >> my_list.sort( key=operator.itemgetter('a') )
>> >> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
>> >> my_list_grouped = list( my_list_grouped )
>>
>> >> This, however, makes it impossible to do the inner sort and
>> >> groupby-operation; you just get the very first element, and that's it.
>>
>> >> An example file is attached.
>>
>> >> Hints, anyone?
>>
>> >> Cheers,
>> >> Nico
>>
>> > Does this example help at all?
>>
>> > my_list.sort( key=itemgetter('a','b','c') )
>> > for a, a_iter in groupby(my_list, itemgetter('a')):
>> >    print 'New A', a
>> >    for b, b_iter in groupby(a_iter, itemgetter('b')):
>> >        print '\t', 'New B', b
>> >        for c, c_iter in groupby(b_iter, itemgetter('c')):
>> >            print '\t'*2, 'New C', c
>> >            for c_data in c_iter:
>> >                print '\t'*3, a, b, c, c_data
>> >            print '\t'*2, 'End C', c
>> >        print '\t', 'End B', b
>> >    print 'End A', a
>>
>> > Jon.
>> > --
>> >http://mail.python.org/mailman/listinfo/python-list
>>
>>
>
> Are you basically after this, then?
>
> for a, a_iter in groupby(my_list, itemgetter('a')):
>    print 'New A', a
>    for b, b_iter in groupby(a_iter, itemgetter('b')):
>        b_list = list(b_iter)
>        for p in ['first', 'second']:
>            for b_data in b_list:
>                #whatever...
>
> Cos that looks like it could be simplified to (untested)
>
> for (a, b), data_iter in groupby(my_list, itemgetter('a','b')):
>   data = list(data) # take copy
>   for pass_ in ['first', 'second']:
>      # do something with data
>
> But from my POV, it's almost looking like a 2-tuple key in a
> defaultdict jobby.
>
> Jon.
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list