I do not have access to the right _hierarchy.py source file
pegah Aliz
pegah.alizadeh at gmail.com
Sun May 17 13:02:57 EDT 2015
On Sunday, May 17, 2015 at 6:18:51 PM UTC+2, pegah Aliz wrote:
> Hello Everybody,
>
> This question seems simple, but I can't find the solution:
>
> I use scipy.cluster.hierarchy to do a hierarchical clustering on a set of points using "cosine" similarity metric. As an example, I have:
>
>
> import scipy.cluster.hierarchy as hac
> import matplotlib.pyplot as plt
>
> Points =
> np.array([[ 0. , 0.23508573],
> [ 0.00754775 , 0.26717266],
> [ 0.00595464 , 0.27775905],
> [ 0.01220563 , 0.23622067],
> [ 0.00542628 , 0.14185873],
> [ 0.03078922 , 0.11273108],
> [ 0.06707743 ,-0.1061131 ],
> [ 0.04411757 ,-0.10775407],
> [ 0.01349434 , 0.00112159],
> [ 0.04066034 , 0.11639591],
> [ 0. , 0.29046682],
> [ 0.07338036 , 0.00609912],
> [ 0.01864988 , 0.0316196 ],
> [ 0. , 0.07270636],
> [ 0. , 0. ]])
>
>
> z = hac.linkage(Points, metric='cosine', method='complete')
> labels = hac.fcluster(z, 0.1, criterion="distance")
>
>
> plt.scatter(Points[:, 0], Points[:, 1], c=labels.astype(np.float))
> plt.show()
>
>
> Since I use cosine metric, in some cases the dot product of two vectors can be negative or norm of some vectors can be zero. It means z output will have some negative or infinite elements which is not valid for fcluster (as below):
>
> z =
> [[ 0.00000000e+00 1.00000000e+01 0.00000000e+00 2.00000000e+00]
> [ 1.30000000e+01 1.50000000e+01 0.00000000e+00 3.00000000e+00]
> [ 8.00000000e+00 1.10000000e+01 4.26658708e-13 2.00000000e+00]
> [ 1.00000000e+00 2.00000000e+00 2.31748880e-05 2.00000000e+00]
> [ 3.00000000e+00 4.00000000e+00 8.96700489e-05 2.00000000e+00]
> [ 1.60000000e+01 1.80000000e+01 3.98805492e-04 5.00000000e+00]
> [ 1.90000000e+01 2.00000000e+01 1.33225099e-03 7.00000000e+00]
> [ 5.00000000e+00 9.00000000e+00 2.41120340e-03 2.00000000e+00]
> [ 6.00000000e+00 7.00000000e+00 1.52914684e-02 2.00000000e+00]
> [ 1.20000000e+01 2.20000000e+01 3.52441432e-02 3.00000000e+00]
> [ 2.10000000e+01 2.40000000e+01 1.38662986e-01 1.00000000e+01]
> [ 1.70000000e+01 2.30000000e+01 6.99056531e-01 4.00000000e+00]
> [ 2.50000000e+01 2.60000000e+01 1.92543748e+00 1.40000000e+01]
> [ -1.00000000e+00 2.70000000e+01 inf 1.50000000e+01]]
>
> To solve this problem, I checked linkage() function and inside it I needed to check _hierarchy.linkage() method. I use pycharm text editor and when I asked for "linkage" source code, it opened up a python file namely "_hierarchy.py" inside the directory like the following:
>
> .PyCharm40/system/python_stubs/-1247972723/scipy/cluster/_hierarchy.py
>
> This python file doesn't have any definition for all included functions.
> I am wondering what is the correct source of this function to revise it and solve my problem.
> I would be appreciated if someone helps me to explore the correct source.
>
> Thanks and Regards
> Pegah
@Peter Thank you. I will do that.
More information about the Python-list
mailing list