I do not have access to the right _hierarchy.py source file

pegah Aliz pegah.alizadeh at gmail.com
Sun May 17 13:00:22 EDT 2015


On Sunday, May 17, 2015 at 6:18:51 PM UTC+2, pegah Aliz wrote:
> Hello Everybody,
> 
> This question seems simple, but I can't find the solution:
> 
> I use scipy.cluster.hierarchy to do a hierarchical clustering on a set of points using "cosine" similarity metric. As an example, I have:
> 
> 
> import scipy.cluster.hierarchy as hac
> import matplotlib.pyplot as plt
> 
> Points = 
>   np.array([[ 0.         , 0.23508573],
>  [ 0.00754775 , 0.26717266],
>  [ 0.00595464 , 0.27775905],
>  [ 0.01220563 , 0.23622067],
>  [ 0.00542628 , 0.14185873],
>  [ 0.03078922 , 0.11273108],
>  [ 0.06707743 ,-0.1061131 ],
>  [ 0.04411757 ,-0.10775407],
>  [ 0.01349434 , 0.00112159],
>  [ 0.04066034 , 0.11639591],
>  [ 0.         , 0.29046682],
>  [ 0.07338036 , 0.00609912],
>  [ 0.01864988 , 0.0316196 ],
>  [ 0.         , 0.07270636],
>  [ 0.         ,  0.        ]]) 
> 
> 
> z = hac.linkage(Points, metric='cosine', method='complete')
> labels = hac.fcluster(z, 0.1, criterion="distance")
> 
> 
> plt.scatter(Points[:, 0], Points[:, 1], c=labels.astype(np.float))
> plt.show()
> 
> 
> Since I use cosine metric, in some cases the dot product of two vectors can be negative or norm of some vectors can be zero. It means z output will have some negative or infinite elements which is not valid for fcluster (as below): 
> 
> z = 
> [[  0.00000000e+00   1.00000000e+01   0.00000000e+00   2.00000000e+00]
>  [  1.30000000e+01   1.50000000e+01   0.00000000e+00   3.00000000e+00]
>  [  8.00000000e+00   1.10000000e+01   4.26658708e-13   2.00000000e+00]
>  [  1.00000000e+00   2.00000000e+00   2.31748880e-05   2.00000000e+00]
>  [  3.00000000e+00   4.00000000e+00   8.96700489e-05   2.00000000e+00]
>  [  1.60000000e+01   1.80000000e+01   3.98805492e-04   5.00000000e+00]
>  [  1.90000000e+01   2.00000000e+01   1.33225099e-03   7.00000000e+00]
>  [  5.00000000e+00   9.00000000e+00   2.41120340e-03   2.00000000e+00]
>  [  6.00000000e+00   7.00000000e+00   1.52914684e-02   2.00000000e+00]
>  [  1.20000000e+01   2.20000000e+01   3.52441432e-02   3.00000000e+00]
>  [  2.10000000e+01   2.40000000e+01   1.38662986e-01   1.00000000e+01]
>  [  1.70000000e+01   2.30000000e+01   6.99056531e-01   4.00000000e+00]
>  [  2.50000000e+01   2.60000000e+01   1.92543748e+00   1.40000000e+01]
>  [ -1.00000000e+00   2.70000000e+01              inf   1.50000000e+01]]
> 
> To solve this problem, I checked linkage() function and inside it I needed to check _hierarchy.linkage() method. I use pycharm text editor and when I asked for "linkage" source code, it opened up a python file namely "_hierarchy.py" inside the directory like the following:
> 
> .PyCharm40/system/python_stubs/-1247972723/scipy/cluster/_hierarchy.py
>  
> This python file doesn't have any definition for all included functions.  
> I am wondering what is the correct source of this function to revise it and solve my problem.
> I would be appreciated if someone helps me to explore the correct source.
> 
> Thanks and Regards
> Pegah



1 - The platform is Linux
2 - After downloading .tar file, making file and configuring, I use pycharm.sh
3 - these are contents of _hierarchy.py : 

# encoding: utf-8
# module scipy.cluster._hierarchy
# from /users/alizadeh/.local/lib/python2.7/site-packages/scipy/cluster/_hierarchy.so
# by generator 1.136
# no doc

# imports
import __builtin__ as __builtins__ # <module '__builtin__' (built-in)>
import numpy as np # /usr/lib/pymodules/python2.7/numpy/__init__.pyc

# functions

def calculate_cluster_sizes(*args, **kwargs): # real signature unknown
    """
    Calculate the size of each cluster. The result is the fourth column of
        the linkage matrix.
    
        Parameters
        ----------
        Z : ndarray
            The linkage matrix. The fourth column can be empty.
        cs : ndarray
            The array to store the sizes.
        n : ndarray
            The number of observations.
    """
    pass

def cluster_dist(*args, **kwargs): # real signature unknown
    """
    Form flat clusters by distance criterion.
    
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        T : ndarray
            The array to store the cluster numbers. The i'th observation belongs to
            cluster `T[i]`.
        cutoff : double
            Clusters are formed when distances are less than or equal to `cutoff`.
        n : int
            The number of observations.
    """
    pass

def cluster_in(*args, **kwargs): # real signature unknown
    """
    Form flat clusters by inconsistent criterion.
    
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        R : ndarray
            The inconsistent matrix.
        T : ndarray
            The array to store the cluster numbers. The i'th observation belongs to
            cluster `T[i]`.
        cutoff : double
            Clusters are formed when the inconsistent values are less than or
            or equal to `cutoff`.
        n : int
            The number of observations.
    """
    pass

def cluster_maxclust_dist(*args, **kwargs): # real signature unknown
    """
    Form flat clusters by maxclust criterion.
    
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        T : ndarray
            The array to store the cluster numbers. The i'th observation belongs to
            cluster `T[i]`.
        n : int
            The number of observations.
        mc : int
            The maximum number of clusters.
    """
    pass

def cluster_maxclust_monocrit(*args, **kwargs): # real signature unknown
    """
    Form flat clusters by maxclust_monocrit criterion.
    
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        MC : ndarray
            The monotonic criterion array.
        T : ndarray
            The array to store the cluster numbers. The i'th observation belongs to
            cluster `T[i]`.
        n : int
            The number of observations.
        max_nc : int
            The maximum number of clusters.
    """
    pass

def cluster_monocrit(*args, **kwargs): # real signature unknown
    """
    Form flat clusters by monocrit criterion.
    
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        MC : ndarray
            The monotonic criterion array.
        T : ndarray
            The array to store the cluster numbers. The i'th observation belongs to
            cluster `T[i]`.
        cutoff : double
            Clusters are formed when the MC values are less than or equal to
            `cutoff`.
        n : int
            The number of observations.
    """
    pass

def cophenetic_distances(*args, **kwargs): # real signature unknown
    """
    Calculate the cophenetic distances between each observation
    
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        d : ndarray
            The condensed matrix to store the cophenetic distances.
        n : int
            The number of observations.
    """
    pass

def get_max_dist_for_each_cluster(*args, **kwargs): # real signature unknown
    """
    Get the maximum inconsistency coefficient for each non-singleton cluster.
    
        Parameters
        ----------
        Z : ndarray
            The linkage matrix.
        MD : ndarray
            The array to store the result.
        n : int
            The number of observations.
    """
    pass

4 - because in hierarchy.py I have a line like this:
 
      _hierarchy.linkage(dm, Z, n,
                         int(_cpy_non_euclid_methods[method]))
which Z value is different before and after it.



More information about the Python-list mailing list