How to group list of list with condition in Python

Peter Otten __peter__ at web.de
Wed Nov 19 05:01:44 EST 2014


Gundala Viswanath wrote:

> I have the following list of lists that contains 6 entries:
> 
> lol = [['a', 3, 1.01], ['x',5, 1.00],['k',7, 2.02],['p',8, 3.00],
> ['b', 10, 1.09],
>        ['f', 12, 2.03]]
> 
> each list in lol contain 3 elements:
> 
> ['a', 3, 1.01]
>   e1  e2   e3
> 
> The list above is already sorted according to the e2 (i.e, 2nd element)
> 
> I'd like to 'cluster' the above list following roughly these steps:
> 
> 1. Pick the lowest entry (wrt. e2) in lol as the key of first cluster
> 2. Assign that as first member of the cluster (dictionary of list)
> 3. Calculate the difference current e3 in next list with first member
> of existing clusters.
> 3. If the difference is less than threshold, assign that list as the
> member of the corresponding cluster Else, create new cluster with
> current list as new key.
> 3. Repeat the rest until finish
> 
> The final result will look like this, with threshold <= 0.1.
> 
> dol = {'a':['a','x','b'], 'k':['k','f'], 'p':['p']}
> 
> I'm stuck with this step what's the right way to do it:
> 
> __BEGIN__
> import json
> from collections import defaultdict
> 
> thres = 0.1
> tmp_e3 = 0
> tmp_e1 = "-"
> 
> lol = [['a', 3, 1.01], ['x',5, 1.00],['k',7, 2.02],
>        ['p',8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]]
> 
> dol = defaultdict(list)
> for thelist in lol:
>     e1, e2, e3 = thelist
> 
>     if tmp_e1 == "-":
>         tmp_e1 = e1
>     else:
>         diff = abs(tmp_e3 - e3)
>         if diff > thres:
>             tmp_e1 = e1
> 
>     dol[tmp_e1].append(e1)
>     tmp_e1 = e1
>     tmp_e3 = e3
> 
> print json.dumps(dol, indent=4)
> __END__

I won't provide the complete solution for a homework question, but here's 
the idea that let me to working code:

Introduce a helper list (let's call it `pairs`) where you put (e1, e3) 
tuples (called `inner_e1`, `inner_e3` below).

Inside the loop over `lol` loop over `pairs` -- and if you find an item 
where the inner_e3 is close enough to e3 you stop and use the inner_e1 as 
the key: 

dol[inner_e1].append(e1)

If there is no inner_e3 close enough you add another item to pairs and a new 
key to dol:

pairs.append((e1, e3))
dol[e1] = [e1]

Hints: `for ... else` with a `break` thrown in fits the above description 
for the inner loop nicely.

Note that this 2-loop approach is not very efficient if the number of 
clusters is large -- but first make it work, then -- maybe -- make it fast.




More information about the Python-list mailing list