Grouping items by a key?

Steven D'Aprano steve+comp.lang.python at pearwood.info
Fri Mar 22 20:27:00 EDT 2013


On Fri, 22 Mar 2013 12:22:13 -0700, Michael Fogleman wrote:

> I feel like Python ought to have a built-in to do this. Take a list of
> items and turn them into a dictionary mapping keys to a list of items
> with that key in common.
> 
> It's easy enough to do:
> 
>     # using defaultdict
>     lookup = collections.defaultdict(list)
>     for item in items:
>         lookup[key(item)].append(item)
>     
>     # or, using plain dict
>     lookup = {}
>     for item in items:
>         lookup.setdefault(key(item), []).append(item)

That's pretty much the reason setdefault was invented. So, in a sense, 
there is a built-in for this.


> But this is frequent enough of a use case that a built-in function would
> be nice.

I'm not so sure I agree it's a frequent use-case. I don't think I've ever 
needed to do it, or if I did, it was so rare and so long ago that I've 
forgotten it.



> I could implement it myself, as such:
> 
>     def grouped(iterable, key):
>         result = {}
>         for item in iterable:
>             result.setdefault(key(item), []).append(item)
>         return result
>     
>     lookup = grouped(items, key)
> 
> This is different than `itertools.groupby` in a few important ways.

Why do you care about itertools.groupby? That does something completely 
different. It groups items that occur in *contiguous* groups, e.g.

[1, 2, 3, 2, 2, 2, 3, 3, 4, 5, 5, 2, 2, 5]

will be grouped into three separate groups of two:

[1], [2], [3], [2, 2, 2], [3, 3], [4], [5, 5], [2, 2], [5]

This is a feature of groupby. If you want to accumulate items regardless 
of where they occur, e.g. for the above:

[1], [2, 2, 2, 2, 2, 2], [3, 3, 3], [4], [5, 5, 5]

then there's no need to use groupby.


> Some examples:
> 
> 
>     >>> items = range(10)
>     >>> grouped(items, lambda x: x % 2)
>     {0: [0, 2, 4, 6, 8], 1: [1, 3, 5, 7, 9]}
>     
>     >>> items = 'hello stack overflow how are you'.split()
>     >>> grouped(items, len)
>     {8: ['overflow'], 3: ['how', 'are', 'you'], 5: ['hello', 'stack']}
> 
> Is there a better way?

Looks perfectly fine to me. It's a five line helper function, it's 
readable and simple and clear. The only improvements I would make would 
be to give it a doc string describing what it does and showing some 
examples:

def grouped(items, key):
    """Return a dict with items accumulated by key.

    >>> items = range(10)
    >>> grouped(items, lambda x: x % 2)
    {0: [0, 2, 4, 6, 8], 1: [1, 3, 5, 7, 9]}
     
    >>> items = 'hello stack overflow how are you'.split()
    >>> grouped(items, len)
    {8: ['overflow'], 3: ['how', 'are', 'you'], 5: ['hello', 'stack']}

    """
    result = {}
    for item in iterable:
        result.setdefault(key(item), []).append(item)
    return result



Now you have a nice, descriptive help string for when you call 
help(grouped).



-- 
Steven



More information about the Python-list mailing list