itertools: problem with nested groupby, list()

Ulrich Eckhardt eckhardt at satorlaser.com
Tue May 4 06:46:20 EDT 2010


Nico Schlömer wrote:
> I ran into a bit of an unexpected issue here with itertools, and I
> need to say that I discovered itertools only recently, so maybe my way
> of approaching the problem is "not what I want to do".
> 
> Anyway, the problem is the following:
> I have a list of dictionaries, something like
> 
> [ { "a": 1, "b": 1, "c": 3 },
>   { "a": 1, "b": 1, "c": 4 },
>   ...
> ]
> 
> and I'd like to iterate through all items with, e.g., "a":1. What I do
> is sort and then groupby,
> 
> my_list.sort( key=operator.itemgetter('a') )
> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
> 
> and then just very simply iterate over my_list_grouped,
> 
> for my_item in my_list_grouped:
>     # do something with my_item[0], my_item[1]

I'd try to avoid copying the list and instead just iterate over it:


    def iterate_by_key(l, key):
        for d in l:
            try:
                yield l[key]
            except:
                continue

Note that you could also ask the dictionary first if it has the key, but I'm
told this way is even faster since it only requires a single lookup
attempt.


> Now, inside this loop I'd like to again iterate over all items with
> the same 'b'-value -- no problem, just do the above inside the loop:
> 
> for my_item in my_list_grouped:
>         # group by keyword "b"
>         my_list2 = list( my_item[1] )
>         my_list2.sort( key=operator.itemgetter('b') )
>         my_list_grouped = itertools.groupby( my_list2,
> operator.itemgetter('b') )
>         for e in my_list_grouped:
>             # do something with e[0], e[1]
> 
> That seems to work all right.

Since your operation not only iterates over a list but first sorts it, it
requires a modification which must not happen while iterating. You work
around this by copying the list first.

> Now, the problem occurs when this all is wrapped into an outer loop, such
> as
> 
> for k in [ 'first pass', 'second pass' ]:
>     for my_item in my_list_grouped:
>     # bla, the above
> 
> To be able to iterate more than once through my_list_grouped, I have
> to convert it into a list first, so outside all loops, I go like
> 
> my_list.sort( key=operator.itemgetter('a') )
> my_list_grouped = itertools.groupby( my_list, operator.itemgetter('a') )
> my_list_grouped = list( my_list_grouped )
> 
> This, however, makes it impossible to do the inner sort and
> groupby-operation; you just get the very first element, and that's it.

I believe that you are doing a modifying operation inside the the iteration,
which is a no-no. Create a custom iterator function (IIRC they are
called "generators") and you should be fine. Note that this should also
perform better since copying and sorting are not exactly for free, though
you may not notice that with small numbers of objects.

Uli

-- 
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932




More information about the Python-list mailing list