Clustering the keys of a dict according to its values

Arnaud Delobelle arnodel at googlemail.com
Fri Nov 14 08:16:03 EST 2008


On Nov 14, 1:16 pm, Florian Brucker <t... at torfbold.com> wrote:
> Hi everybody!
>
> Given a dictionary, I want to create a clustered version of it,
> collecting keys that have the same value:
>
>  >>> d = {'a':1, 'b':2, 'c':1, 'd':1, 'e':2, 'f':3}
>  >>> cluster(d)
> {1:['a', 'c', 'd'], 2:['b', 'e'], 3:['f']}
>
> That is, generate a new dict which holds for each value of the old dict
> a list of the keys of the old dict that have that very value.
>
> Another requirement is that it should also work on lists, in that case
> with indices instead of keys. We may assume that all values in the
> original dict/list can be used as dict keys.
>
> Right now I'm doing it like this:
>
>    def cluster(d):
>      try:
>        # is d a dict?
>        values = unique(d.values())
>        keys = d.keys()
>      except AttributeError:
>        # assume d is a list
>        values = unique(d)
>        keys = range(len(d))
>
>      clusters = {}
>      for value in values:
>        clusters[value] = filter(lambda v: d[v] == value, keys)
>
>      return clusters
>
> where unique() is from O'Reilly's Python Cookbook and returns a list
> containing each item of the given list exactly once.
>
> Now I'm pretty new to Python and chances are rather high that this could
> be done prettier and/or more efficient. Therefore any comment on the
> above code is greatly appreciated.
>
> Regards,
> Florian

Your implementation is pretty inefficient as it iterates over the
whole dictionary as many times as there are distinct values in it.
Here is a simpler solution.

from collections import defaultdict

def cluster(d):
    clusters = defaultdict(list)
    for key, val in d.iteritems():
        clusters[val].append(key)
    return clusters


Or without defaultdict:

def cluster(d):
    clusters = {}
    for key, val in d.iteritems():
        clusters.setdefault(val, []).append(key)
    return clusters


--
Arnaud



More information about the Python-list mailing list