How to populate all possible hierarchical clusterings from a set of elements?

Alain Ketterlin alain at dpt-info.u-strasbg.fr
Thu Jan 13 10:59:06 EST 2011


Richard Thomas <chardster at gmail.com> writes:

> On Jan 13, 10:02 am, Alain Ketterlin <al... at dpt-info.u-strasbg.fr>

>> def clusterings(l):
>>     if len(l) == 1:
>>         print repr(l)
>>     else:
>>         n = len(l)
>>         for i in xrange(n):
>>             for j in xrange(i+1,n):
>>                 clusterings(l[:i]+l[i+1:j]+l[j+1:]+[[l[i],l[j]]])

>> [...] there are *many* such clusterings? (the exact number should be
>> (n!)*((n-1)!)/2^(n-1) given the code above, if I'm not mistaken.)

> Actually the number of such "clusterings" is the (n-1)th Catalan
> number.
>
> http://en.wikipedia.org/wiki/Catalan_numbers

I don't think Catalan numbers exactly captures this number. As far as I
remember (and wikipedia seems to confirm this), Cn is the number of ways
you can repeatedly apply a binary operator to a sequence of objects,
where sequence means that the objects are ordered, which is not the case
here. To use wikipedia's example, C3 is 5 because you can do:

((ab)c)d (a(bc))d (ab)(cd) a((bc)d) a(b(cd))

If we list clusterings we can also have:

((ac)b)d ((ac)d)b (ac)(bd) ...

Actually, for each of the 5 "catalan expressions" above, you have 4!
valid permutations of the objects (leading to a complete count of
n!*C(n-1)). But this leads to many "duplicates", because (ab)(cd) and
(cd)(ab) are considered the same.

I just realize that the code I've given above also produces duplicates
(in particular, the example I've just used). At least, my counting was
correct w.r.t. the code :-) The plot thickens...

-- Alain.



More information about the Python-list mailing list