How to group list of list with condition in Python
Peter Otten
__peter__ at web.de
Wed Nov 19 05:01:44 EST 2014
Gundala Viswanath wrote:
> I have the following list of lists that contains 6 entries:
>
> lol = [['a', 3, 1.01], ['x',5, 1.00],['k',7, 2.02],['p',8, 3.00],
> ['b', 10, 1.09],
> ['f', 12, 2.03]]
>
> each list in lol contain 3 elements:
>
> ['a', 3, 1.01]
> e1 e2 e3
>
> The list above is already sorted according to the e2 (i.e, 2nd element)
>
> I'd like to 'cluster' the above list following roughly these steps:
>
> 1. Pick the lowest entry (wrt. e2) in lol as the key of first cluster
> 2. Assign that as first member of the cluster (dictionary of list)
> 3. Calculate the difference current e3 in next list with first member
> of existing clusters.
> 3. If the difference is less than threshold, assign that list as the
> member of the corresponding cluster Else, create new cluster with
> current list as new key.
> 3. Repeat the rest until finish
>
> The final result will look like this, with threshold <= 0.1.
>
> dol = {'a':['a','x','b'], 'k':['k','f'], 'p':['p']}
>
> I'm stuck with this step what's the right way to do it:
>
> __BEGIN__
> import json
> from collections import defaultdict
>
> thres = 0.1
> tmp_e3 = 0
> tmp_e1 = "-"
>
> lol = [['a', 3, 1.01], ['x',5, 1.00],['k',7, 2.02],
> ['p',8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]]
>
> dol = defaultdict(list)
> for thelist in lol:
> e1, e2, e3 = thelist
>
> if tmp_e1 == "-":
> tmp_e1 = e1
> else:
> diff = abs(tmp_e3 - e3)
> if diff > thres:
> tmp_e1 = e1
>
> dol[tmp_e1].append(e1)
> tmp_e1 = e1
> tmp_e3 = e3
>
> print json.dumps(dol, indent=4)
> __END__
I won't provide the complete solution for a homework question, but here's
the idea that let me to working code:
Introduce a helper list (let's call it `pairs`) where you put (e1, e3)
tuples (called `inner_e1`, `inner_e3` below).
Inside the loop over `lol` loop over `pairs` -- and if you find an item
where the inner_e3 is close enough to e3 you stop and use the inner_e1 as
the key:
dol[inner_e1].append(e1)
If there is no inner_e3 close enough you add another item to pairs and a new
key to dol:
pairs.append((e1, e3))
dol[e1] = [e1]
Hints: `for ... else` with a `break` thrown in fits the above description
for the inner loop nicely.
Note that this 2-loop approach is not very efficient if the number of
clusters is large -- but first make it work, then -- maybe -- make it fast.
More information about the Python-list
mailing list