itertools: problem with nested groupby, list()

Peter Otten __peter__ at web.de
Tue May 4 10:10:55 EDT 2010


Nico Schlömer wrote:

> Hi,
> 
> I ran into a bit of an unexpected issue here with itertools, and I
> need to say that I discovered itertools only recently, so maybe my way
> of approaching the problem is "not what I want to do".
> 
> Anyway, the problem is the following:
> I have a list of dictionaries, something like
> 
> [ { "a": 1, "b": 1, "c": 3 },
>   { "a": 1, "b": 1, "c": 4 },
>   ...
> ]
> 
> and I'd like to iterate through all items with, e.g., "a":1. What I do
> is sort and then groupby,
> 
> my_list.sort( key=operator.itemgetter('a') )
> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
> 
> and then just very simply iterate over my_list_grouped,
> 
> for my_item in my_list_grouped:
>     # do something with my_item[0], my_item[1]
> 
> Now, inside this loop I'd like to again iterate over all items with
> the same 'b'-value -- no problem, just do the above inside the loop:
> 
> for my_item in my_list_grouped:
>         # group by keyword "b"
>         my_list2 = list( my_item[1] )
>         my_list2.sort( key=operator.itemgetter('b') )
>         my_list_grouped = itertools.groupby( my_list2,
> operator.itemgetter('b') )
>         for e in my_list_grouped:
>             # do something with e[0], e[1]
> 
> That seems to work all right.
> 
> Now, the problem occurs when this all is wrapped into an outer loop, such
> as
> 
> for k in [ 'first pass', 'second pass' ]:
>     for my_item in my_list_grouped:
>     # bla, the above
> 
> To be able to iterate more than once through my_list_grouped, I have
> to convert it into a list first, so outside all loops, I go like
> 
> my_list.sort( key=operator.itemgetter('a') )
> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
> my_list_grouped = list( my_list_grouped )
> 
> This, however, makes it impossible to do the inner sort and
> groupby-operation; you just get the very first element, and that's it.
> 
> An example file is attached.
> 
> Hints, anyone?

If you want a reusable copy of a groupby(...) it is not enough to convert it 
to a list as a whole:

>>> from itertools import groupby
>>> from operator import itemgetter
>>> items = [(1,1), (1,2), (1,3), (2,1), (2,2)]
>>> grouped_items = list(groupby(items, key=itemgetter(0))) # WRONG
>>> for run in 1, 2:
...     print "run", run
...     for k, g in grouped_items:
...             print k, list(g)
...
run 1
1 []
2 [(2, 2)]
run 2
1 []
2 []

Instead, you have to process the groups, too:

>>> grouped_items = [(k, list(g)) for k, g in groupby(items, 
key=itemgetter(0))]
>>> for run in 1, 2:
...     print "run", run
...     for k, g in grouped_items:
...             print k, list(g)
...
run 1
1 [(1, 1), (1, 2), (1, 3)]
2 [(2, 1), (2, 2)]
run 2
1 [(1, 1), (1, 2), (1, 3)]
2 [(2, 1), (2, 2)]

But usually you don't bother and just run groupby() twice:

>>> for run in 1, 2:
...     print "run", run
...     for k, g in groupby(items, key=itemgetter(0)):
...             print k, list(g)
...
run 1
1 [(1, 1), (1, 2), (1, 3)]
2 [(2, 1), (2, 2)]
run 2
1 [(1, 1), (1, 2), (1, 3)]
2 [(2, 1), (2, 2)]

The only caveat then is that list(items) == list(items) must hold.

Peter



More information about the Python-list mailing list