How to populate all possible hierarchical clusterings from a set of elements?

Richard Thomas chardster at gmail.com
Thu Jan 13 11:58:42 EST 2011


On Jan 13, 3:59 pm, Alain Ketterlin <al... at dpt-info.u-strasbg.fr>
wrote:
> Richard Thomas <chards... at gmail.com> writes:
> > On Jan 13, 10:02 am, Alain Ketterlin <al... at dpt-info.u-strasbg.fr>
> >> def clusterings(l):
> >>     if len(l) == 1:
> >>         print repr(l)
> >>     else:
> >>         n = len(l)
> >>         for i in xrange(n):
> >>             for j in xrange(i+1,n):
> >>                 clusterings(l[:i]+l[i+1:j]+l[j+1:]+[[l[i],l[j]]])
> >> [...] there are *many* such clusterings? (the exact number should be
> >> (n!)*((n-1)!)/2^(n-1) given the code above, if I'm not mistaken.)
> > Actually the number of such "clusterings" is the (n-1)th Catalan
> > number.
>
> >http://en.wikipedia.org/wiki/Catalan_numbers
>
> I don't think Catalan numbers exactly captures this number. As far as I
> remember (and wikipedia seems to confirm this), Cn is the number of ways
> you can repeatedly apply a binary operator to a sequence of objects,
> where sequence means that the objects are ordered, which is not the case
> here. To use wikipedia's example, C3 is 5 because you can do:
>
> ((ab)c)d (a(bc))d (ab)(cd) a((bc)d) a(b(cd))
>
> If we list clusterings we can also have:
>
> ((ac)b)d ((ac)d)b (ac)(bd) ...
>
> Actually, for each of the 5 "catalan expressions" above, you have 4!
> valid permutations of the objects (leading to a complete count of
> n!*C(n-1)). But this leads to many "duplicates", because (ab)(cd) and
> (cd)(ab) are considered the same.
>
> I just realize that the code I've given above also produces duplicates
> (in particular, the example I've just used). At least, my counting was
> correct w.r.t. the code :-) The plot thickens...
>
> -- Alain.

Okay, I misunderstood the problem, sorry about that. This makes it
rather hard to define a nice recurrence relation for the number of
such clusterings:

C(1) = 1
C(2n+1) = Sigma(1; n; choose(2n+1, r) * C(r) * C(2n+1-r))
C(2n) = Sigma(1; n-1; choose(2n, r) * C(r) * C(2n-r))
          + choose(2n, n) * C(n) * C(n) / 2

See, very ugly. I can't reduce it to anything workable so I just
computed it. Clearly its more than exponential. Some values:

In [1]: [cluster(n) for n in xrange(1, 21)]
Out[1]:
[1,
 1,
 3,
 15,
 105,
 945,
 10395,
 135135,
 2027025,
 34459425,
 654729075,
 13749310575L,
 316234143225L,
 7905853580625L,
 213458046676875L,
 6190283353629375L,
 191898783962510625L,
 6332659870762850625L,
 221643095476699771875L,
 8200794532637891559375L]

Anyway, I'm done counting things for now.

Chard.



More information about the Python-list mailing list