[SciPy-User] how do I get the subtrees of dendrogram made by scipy.cluster.hierarchy?

Waleed Hamra hamra at whamra.com
Mon Jun 3 14:55:53 EDT 2013


I had a confusion regarding this module (scipy.cluster.hierarchy) ... and 
still have some !

For example we have this dendrogram: 
http://img62.imageshack.us/img62/8130/3ieb4.png

My question is how can I extract the coloured subtrees (each one represent a 
cluster) in a nice format, say SIF format ? Now the code to get the plot above 
is:

In [1]: import scipy

In [2]: import scipy.cluster.hierarchy as sch

In [3]: import matplotlib.pylab as plt

In [4]: X = scipy.randn(100,2)

In [5]: d = sch.distance.pdist(X)

In [6]: Z= sch.linkage(d,method='complete')

In [7]: P =sch.dendrogram(Z)

In [8]: plt.savefig('plot_dendrogram.png')

In [9]: T = sch.fcluster(Z, 0.5*d.max(), 'distance')

In [10]: T
Out[10]: 
      array([4, 5, 3, 2, 2, 3, 5, 2, 2, 5, 2, 2, 2, 3, 2, 3, 2, 5, 4, 5, 2, 5, 
2,
      3, 3, 3, 1, 3, 4, 2, 2, 4, 2, 4, 3, 3, 2, 5, 5, 5, 3, 2, 2, 2, 5, 4,
      2, 4, 2, 2, 5, 5, 1, 2, 3, 2, 2, 5, 4, 2, 5, 4, 3, 5, 4, 4, 2, 2, 2,
      4, 2, 5, 2, 2, 3, 3, 2, 4, 5, 3, 4, 4, 2, 1, 5, 4, 2, 2, 5, 5, 2, 2,
      5, 5, 5, 4, 3, 3, 2, 4], dtype=int32)

In [11]: sch.leaders(Z,T)
Out[11]: 
      (array([190, 191, 182, 193, 194], dtype=int32), array([2, 3, 1, 
4,5],dtype=int32))

So now, the output of fcluster() gives the clustering of the nodes (by their 
id's), and leaders() described here is supposed to return 2 arrays:

    first one contains the leader nodes of the clusters generated by Z, here we 
can see we have 5 clusters, as well as in the plot

    and the second one the id's of these clusters

So if this leaders() returns resp. L and M : L[2]=182 and M[2]=1, then cluster 
1 is leaded by node id 182, which doesn't exist in the observations set X, the 
documentation says "... then it corresponds to a non-singleton cluster". But I 
can't get it ...

Also, I converted the Z to a tree by sch.to_tree(Z), that will return an easy-
to-use tree object, which I want to visualize, but which tool should I use as 
a graphical platform that manipulate these kind of tree objects as inputs?

thanks in advance :)



More information about the SciPy-User mailing list