Grouping items by a key?
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Fri Mar 22 20:27:00 EDT 2013
On Fri, 22 Mar 2013 12:22:13 -0700, Michael Fogleman wrote:
> I feel like Python ought to have a built-in to do this. Take a list of
> items and turn them into a dictionary mapping keys to a list of items
> with that key in common.
>
> It's easy enough to do:
>
> # using defaultdict
> lookup = collections.defaultdict(list)
> for item in items:
> lookup[key(item)].append(item)
>
> # or, using plain dict
> lookup = {}
> for item in items:
> lookup.setdefault(key(item), []).append(item)
That's pretty much the reason setdefault was invented. So, in a sense,
there is a built-in for this.
> But this is frequent enough of a use case that a built-in function would
> be nice.
I'm not so sure I agree it's a frequent use-case. I don't think I've ever
needed to do it, or if I did, it was so rare and so long ago that I've
forgotten it.
> I could implement it myself, as such:
>
> def grouped(iterable, key):
> result = {}
> for item in iterable:
> result.setdefault(key(item), []).append(item)
> return result
>
> lookup = grouped(items, key)
>
> This is different than `itertools.groupby` in a few important ways.
Why do you care about itertools.groupby? That does something completely
different. It groups items that occur in *contiguous* groups, e.g.
[1, 2, 3, 2, 2, 2, 3, 3, 4, 5, 5, 2, 2, 5]
will be grouped into three separate groups of two:
[1], [2], [3], [2, 2, 2], [3, 3], [4], [5, 5], [2, 2], [5]
This is a feature of groupby. If you want to accumulate items regardless
of where they occur, e.g. for the above:
[1], [2, 2, 2, 2, 2, 2], [3, 3, 3], [4], [5, 5, 5]
then there's no need to use groupby.
> Some examples:
>
>
> >>> items = range(10)
> >>> grouped(items, lambda x: x % 2)
> {0: [0, 2, 4, 6, 8], 1: [1, 3, 5, 7, 9]}
>
> >>> items = 'hello stack overflow how are you'.split()
> >>> grouped(items, len)
> {8: ['overflow'], 3: ['how', 'are', 'you'], 5: ['hello', 'stack']}
>
> Is there a better way?
Looks perfectly fine to me. It's a five line helper function, it's
readable and simple and clear. The only improvements I would make would
be to give it a doc string describing what it does and showing some
examples:
def grouped(items, key):
"""Return a dict with items accumulated by key.
>>> items = range(10)
>>> grouped(items, lambda x: x % 2)
{0: [0, 2, 4, 6, 8], 1: [1, 3, 5, 7, 9]}
>>> items = 'hello stack overflow how are you'.split()
>>> grouped(items, len)
{8: ['overflow'], 3: ['how', 'are', 'you'], 5: ['hello', 'stack']}
"""
result = {}
for item in iterable:
result.setdefault(key(item), []).append(item)
return result
Now you have a nice, descriptive help string for when you call
help(grouped).
--
Steven
More information about the Python-list
mailing list