[Scipy-svn] r5170 - in trunk/scipy: cluster spatial

scipy-svn at scipy.org scipy-svn at scipy.org
Sat Nov 22 15:18:16 EST 2008


Author: damian.eads
Date: 2008-11-22 14:18:12 -0600 (Sat, 22 Nov 2008)
New Revision: 5170

Modified:
   trunk/scipy/cluster/hierarchy.py
   trunk/scipy/spatial/distance.py
Log:
Word-smithing some hierarchy and spatial documentation.

Modified: trunk/scipy/cluster/hierarchy.py
===================================================================
--- trunk/scipy/cluster/hierarchy.py	2008-11-22 19:09:35 UTC (rev 5169)
+++ trunk/scipy/cluster/hierarchy.py	2008-11-22 20:18:12 UTC (rev 5170)
@@ -331,6 +331,20 @@
     Performs centroid/UPGMC linkage. See ``linkage`` for more
     information on the return structure and algorithm.
 
+    The following are common calling conventions:
+
+    1. Z = centroid(y)
+
+       Performs centroid/UPGMC linkage on the condensed distance
+       matrix ``y``.  See ``linkage`` for more information on the return
+       structure and algorithm.
+
+    2. Z = centroid(X)
+
+       Performs centroid/UPGMC linkage on the observation matrix ``X``
+       using Euclidean distance as the distance metric. See ``linkage``
+       for more information on the return structure and algorithm.
+
     :Parameters:
         Q : ndarray
             A condensed or redundant distance matrix. A condensed
@@ -346,21 +360,6 @@
             the ``linkage`` function documentation for more information
             on its structure.
 
-    Calling Conventions
-    -------------------
-
-    1. Z = centroid(y)
-
-       Performs centroid/UPGMC linkage on the condensed distance
-       matrix ``y``.  See ``linkage`` for more information on the return
-       structure and algorithm.
-
-    2. Z = centroid(X)
-
-       Performs centroid/UPGMC linkage on the observation matrix ``X``
-       using Euclidean distance as the distance metric. See ``linkage``
-       for more information on the return structure and algorithm.
-
     :SeeAlso:
        - linkage: for advanced creation of hierarchical clusterings.
     """
@@ -371,18 +370,8 @@
     Performs median/WPGMC linkage. See ``linkage`` for more
     information on the return structure and algorithm.
 
-    :Parameters:
-        Q : ndarray
-            A condensed or redundant distance matrix. A condensed
-            distance matrix is a flat array containing the upper
-            triangular of the distance matrix. This is the form that
-            ``pdist`` returns. Alternatively, a collection of
-            m observation vectors in n dimensions may be passed as
-            a m by n array.
+    The following are common calling conventions:
 
-    Calling Conventions
-    -------------------
-
     1. Z = median(y)
 
        Performs median/WPGMC linkage on the condensed distance matrix
@@ -395,6 +384,19 @@
        using Euclidean distance as the distance metric. See linkage
        for more information on the return structure and algorithm.
 
+    :Parameters:
+        Q : ndarray
+            A condensed or redundant distance matrix. A condensed
+            distance matrix is a flat array containing the upper
+            triangular of the distance matrix. This is the form that
+            ``pdist`` returns. Alternatively, a collection of
+            m observation vectors in n dimensions may be passed as
+            a m by n array.
+
+    :Returns:
+       - Z : ndarray
+           The hierarchical clustering encoded as a linkage matrix.
+
     :SeeAlso:
        - linkage: for advanced creation of hierarchical clusterings.
     """
@@ -406,18 +408,8 @@
     matrix. See linkage for more information on the return structure
     and algorithm.
 
-    :Parameters:
-        Q : ndarray
-            A condensed or redundant distance matrix. A condensed
-            distance matrix is a flat array containing the upper
-            triangular of the distance matrix. This is the form that
-            ``pdist`` returns. Alternatively, a collection of
-            m observation vectors in n dimensions may be passed as
-            a m by n array.
+    The following are common calling conventions:
 
-    Calling Conventions
-    -------------------
-
     1. Z = ward(y)
        Performs Ward's linkage on the condensed distance matrix Z. See
        linkage for more information on the return structure and
@@ -428,6 +420,19 @@
        Euclidean distance as the distance metric. See linkage for more
        information on the return structure and algorithm.
 
+    :Parameters:
+        Q : ndarray
+            A condensed or redundant distance matrix. A condensed
+            distance matrix is a flat array containing the upper
+            triangular of the distance matrix. This is the form that
+            ``pdist`` returns. Alternatively, a collection of
+            m observation vectors in n dimensions may be passed as
+            a m by n array.
+
+    :Returns:
+       - Z : ndarray
+           The hierarchical clustering encoded as a linkage matrix.
+
     :SeeAlso:
        - linkage: for advanced creation of hierarchical clusterings.
     """
@@ -476,111 +481,109 @@
     combined to form cluster :math:`u`. Let :math:`v` be any
     remaining cluster in the forest that is not :math:`u`.
 
-    :Parameters:
-       Q : ndarray
-           A condensed or redundant distance matrix. A condensed
-           distance matrix is a flat array containing the upper
-           triangular of the distance matrix. This is the form that
-           ``pdist`` returns. Alternatively, a collection of
-           :math:`m` observation vectors in n dimensions may be passed as
-           a :math:`m` by :math:`n` array.
-       method : string
-           The linkage algorithm to use. See the ``Linkage Methods``
-           section below for full descriptions.
-       metric : string
-           The distance metric to use. See the ``distance.pdist``
-           function for a list of valid distance metrics.
-
-    Linkage Methods
-    ---------------
-
     The following are methods for calculating the distance between the
     newly formed cluster :math:`u` and each :math:`v`.
 
-    * method=``single`` assigns
+     * method=``single`` assigns
 
-      .. math:
-         d(u,v) = \min(dist(u[i],v[j]))
+       .. math::
+          d(u,v) = \min(dist(u[i],v[j]))
 
-      for all points :math:`i` in cluster :math:`u` and
-      :math:`j` in cluster :math:`v`. This is also known as the
-      Nearest Point Algorithm.
+       for all points :math:`i` in cluster :math:`u` and
+       :math:`j` in cluster :math:`v`. This is also known as the
+       Nearest Point Algorithm.
 
-    * method=``complete`` assigns
+     * method=``complete`` assigns
 
-      .. math:
-         d(u, v) = \max(dist(u[i],v[j]))
+       .. math::
+          d(u, v) = \max(dist(u[i],v[j]))
 
-      for all points :math:`i` in cluster u and :math:`j` in
-      cluster :math:`v`. This is also known by the Farthest Point
-      Algorithm or Voor Hees Algorithm.
+       for all points :math:`i` in cluster u and :math:`j` in
+       cluster :math:`v`. This is also known by the Farthest Point
+       Algorithm or Voor Hees Algorithm.
 
-    * method=``average`` assigns
+     * method=``average`` assigns
 
-      .. math:
-         d(u,v) = \sum_{ij} \frac{d(u[i], v[j])}
-                                 {(|u|*|v|)
+       .. math::
+          d(u,v) = \sum_{ij} \frac{d(u[i], v[j])}
+                                  {(|u|*|v|)
 
-      for all points :math:`i` and :math:`j` where :math:`|u|`
-      and :math:`|v|` are the cardinalities of clusters :math:`u`
-      and :math:`v`, respectively. This is also called the UPGMA
-      algorithm. This is called UPGMA.
+       for all points :math:`i` and :math:`j` where :math:`|u|`
+       and :math:`|v|` are the cardinalities of clusters :math:`u`
+       and :math:`v`, respectively. This is also called the UPGMA
+       algorithm. This is called UPGMA.
 
-    * method='weighted' assigns
+     * method='weighted' assigns
 
-      .. math:
-         d(u,v) = (dist(s,v) + dist(t,v))/2
+       .. math::
+          d(u,v) = (dist(s,v) + dist(t,v))/2
 
-      where cluster u was formed with cluster s and t and v
-      is a remaining cluster in the forest. (also called WPGMA)
+       where cluster u was formed with cluster s and t and v
+       is a remaining cluster in the forest. (also called WPGMA)
 
+     * method='centroid' assigns
 
-    * method='centroid' assigns
+       .. math::
+          dist(s,t) = euclid(c_s, c_t)
 
-      .. math:
-         dist(s,t) = euclid(c_s, c_t)
+       where :math:`c_s` and :math:`c_t` are the centroids of
+       clusters :math:`s` and :math:`t`, respectively. When two
+       clusters :math:`s` and :math:`t` are combined into a new
+       cluster :math:`u`, the new centroid is computed over all the
+       original objects in clusters :math:`s` and :math:`t`. The
+       distance then becomes the Euclidean distance between the
+       centroid of :math:`u` and the centroid of a remaining cluster
+       :math:`v` in the forest. This is also known as the UPGMC
+       algorithm.
 
-      where :math:`c_s` and :math:`c_t` are the centroids of
-      clusters :math:`s` and :math:`t`, respectively. When two
-      clusters :math:`s` and :math:`t` are combined into a new
-      cluster :math:`u`, the new centroid is computed over all the
-      original objects in clusters :math:`s` and :math:`t`. The
-      distance then becomes the Euclidean distance between the
-      centroid of :math:`u` and the centroid of a remaining cluster
-      :math:`v` in the forest. This is also known as the UPGMC
-      algorithm.
+     * method='median' assigns math:`$d(s,t)$` like the ``centroid``
+       method. When two clusters s and t are combined into a new
+       cluster :math:`u`, the average of centroids s and t give the
+       new centroid :math:`u`. This is also known as the WPGMC
+       algorithm.
 
-    * method='median' assigns math:`$d(s,t)$` like the ``centroid``
-      method. When two clusters s and t are combined into a new
-      cluster :math:`u`, the average of centroids s and t give the
-      new centroid :math:`u`. This is also known as the WPGMC
-      algorithm.
+     * method='ward' uses the Ward variance minimization algorithm.
+       The new entry :math:`d(u,v)` is computed as follows,
 
-    * method='ward' uses the Ward variance minimization algorithm.
-      The new entry :math:`d(u,v)` is computed as follows,
+       .. math::
 
-      .. math:
+          d(u,v) = \sqrt{\frac{|v|+|s|}
+                              {T}d(v,s)^2
+                       + \frac{|v|+|t|}
+                              {T}d(v,t)^2
+                       + \frac{|v|}
+                              {T}d(s,t)^2}
 
-         d(u,v) = \sqrt{\frac{|v|+|s|}
-                             {T}d(v,s)^2
-                      + \frac{|v|+|t|}
-                             {T}d(v,t)^2
-                      + \frac{|v|}
-                             {T}d(s,t)^2}
+       where :math:`u` is the newly joined cluster consisting of
+       clusters :math:`s` and :math:`t`, :math:`v` is an unused
+       cluster in the forest, :math:`T=|v|+|s|+|t|`, and
+       :math:`|*|` is the cardinality of its argument. This is also
+       known as the incremental algorithm.
 
-      where :math:`u` is the newly joined cluster consisting of
-      clusters :math:`s` and :math:`t`, :math:`v` is an unused
-      cluster in the forest, :math:`T=|v|+|s|+|t|`, and
-      :math:`|*|` is the cardinality of its argument. This is also
-      known as the incremental algorithm.
+    Warning: When the minimum distance pair in the forest is chosen, there may
+    be two or more pairs with the same minimum distance. This
+    implementation may chose a different minimum than the MATLAB(TM)
+    version.
 
-   Warning
-   -------
+    :Parameters:
+       - Q : ndarray
+           A condensed or redundant distance matrix. A condensed
+           distance matrix is a flat array containing the upper
+           triangular of the distance matrix. This is the form that
+           ``pdist`` returns. Alternatively, a collection of
+           :math:`m` observation vectors in n dimensions may be passed as
+           a :math:`m` by :math:`n` array.
+       - method : string
+           The linkage algorithm to use. See the ``Linkage Methods``
+           section below for full descriptions.
+       - metric : string
+           The distance metric to use. See the ``distance.pdist``
+           function for a list of valid distance metrics.
 
-   When the minimum distance pair in the forest is chosen, there may
-   be two or more pairs with the same minimum distance. This
-   implementation may chose a different minimum than the MATLAB(TM)
-   version.
+   :Returns:
+
+       - Z : ndarray
+           The hierarchical clustering encoded as a linkage matrix.
    """
     if not isinstance(method, str):
         raise TypeError("Argument 'method' must be a string.")
@@ -788,6 +791,10 @@
     the ClusterNode object is a leaf node, its count must be 1, and its
     distance is meaningless but set to 0.
 
+    Note: This function is provided for the convenience of the library
+    user. ClusterNodes are not used as input to any of the functions in this
+    library.
+
     :Parameters:
 
        - Z : ndarray
@@ -807,9 +814,6 @@
         - L : list
           The pre-order traversal.
 
-    Note: This function is provided for the convenience of the library
-    user. ClusterNodes are not used as input to any of the functions in this
-    library.
     """
 
     Z = np.asarray(Z, order='c')
@@ -945,6 +949,9 @@
     r"""
     Calculates inconsistency statistics on a linkage.
 
+    Note: This function behaves similarly to the MATLAB(TM)
+    inconsistent function.
+
     :Parameters:
        - d : int
            The number of links up to ``d`` levels below each
@@ -971,9 +978,6 @@
            
                \frac{\mathtt{Z[i,2]}-\mathtt{R[i,0]}}
                     {R[i,1]}.
-
-    This function behaves similarly to the MATLAB(TM) inconsistent
-    function.
     """
     Z = np.asarray(Z, order='c')
 

Modified: trunk/scipy/spatial/distance.py
===================================================================
--- trunk/scipy/spatial/distance.py	2008-11-22 19:09:35 UTC (rev 5169)
+++ trunk/scipy/spatial/distance.py	2008-11-22 20:18:12 UTC (rev 5170)
@@ -283,10 +283,10 @@
     Computes the Cosine distance between two n-vectors u and v, which
     is defined as
 
-      .. math::
+    .. math::
 
-         \frac{1-uv^T}
-              {||u||_2 ||v||_2}.
+       \frac{1-uv^T}
+            {||u||_2 ||v||_2}.
 
     :Parameters:
        u : ndarray
@@ -341,7 +341,7 @@
     ``u`` and ``v``. If ``u`` and ``v`` are boolean vectors, the Hamming
     distance is
 
-    .. math:
+    .. math::
 
        \frac{c_{01} + c_{10}}{n}
 
@@ -398,7 +398,7 @@
     Computes the Kulsinski dissimilarity between two boolean n-vectors
     u and v, which is defined as
 
-    .. math:
+    .. math::
 
        \frac{c_{TF} + c_{FT} - c_{TT} + n}
             {c_{FT} + c_{TF} + n}
@@ -453,7 +453,7 @@
     Computes the Manhattan distance between two n-vectors u and v,
     which is defined as
 
-    .. math:
+    .. math::
 
        \sum_i {u_i-v_i}.
 
@@ -476,7 +476,8 @@
     Computes the Mahalanobis distance between two n-vectors ``u`` and ``v``,
     which is defiend as
 
-    .. math:
+    .. math::
+
        (u-v)V^{-1}(u-v)^T
 
     where ``VI`` is the inverse covariance matrix :math:`V^{-1}`.
@@ -501,7 +502,8 @@
     Computes the Chebyshev distance between two n-vectors u and v,
     which is defined as
 
-    .. math:
+    .. math::
+
        \max_i {|u_i-v_i|}.
 
     :Parameters:
@@ -523,7 +525,7 @@
     Computes the Bray-Curtis distance between two n-vectors ``u`` and
     ``v``, which is defined as
 
-    .. math:
+    .. math::
 
        \sum{|u_i-v_i|} / \sum{|u_i+v_i|}.
 
@@ -546,7 +548,7 @@
     Computes the Canberra distance between two n-vectors u and v,
     which is defined as
 
-    .. math:
+    .. math::
 
        \frac{\sum_i {|u_i-v_i|}}
             {\sum_i {|u_i|+|v_i|}}.
@@ -610,7 +612,7 @@
     which is defined as
 
 
-    .. math:
+    .. math::
 
          \frac{R}
          \frac{c_{TT} + c_{FF} + \frac{R}{2}}
@@ -639,7 +641,7 @@
     Computes the Matching dissimilarity between two boolean n-vectors
     u and v, which is defined as
 
-    .. math:
+    .. math::
 
        \frac{c_{TF} + c_{FT}}{n}
 
@@ -667,7 +669,7 @@
     Computes the Dice dissimilarity between two boolean n-vectors
     ``u`` and ``v``, which is
 
-    .. math:
+    .. math::
 
          \frac{c_{TF} + c_{FT}}
               {2c_{TT} + c_{FT} + c_{TF}}
@@ -700,7 +702,7 @@
     Computes the Rogers-Tanimoto dissimilarity between two boolean
     n-vectors ``u`` and ``v``, which is defined as
 
-    .. math:
+    .. math::
        \frac{R}
             {c_{TT} + c_{FF} + R}
 
@@ -729,7 +731,7 @@
     Computes the Russell-Rao dissimilarity between two boolean n-vectors
     ``u`` and ``v``, which is defined as
 
-    .. math:
+    .. math::
 
       \frac{n - c_{TT}}
            {n}
@@ -761,7 +763,7 @@
     Computes the Sokal-Michener dissimilarity between two boolean vectors
     ``u`` and ``v``, which is defined as
 
-    .. math:
+    .. math::
 
        \frac{2R}
             {S + 2R}
@@ -797,7 +799,7 @@
     Computes the Sokal-Sneath dissimilarity between two boolean vectors
     ``u`` and ``v``,
 
-    .. math:
+    .. math::
 
        \frac{2R}
             {c_{TT} + 2R}
@@ -838,33 +840,8 @@
     this entry or to convert the condensed distance matrix to a
     redundant square matrix.
 
-    :Parameters:
-       X : ndarray
-           An m by n array of m original observations in an
-           n-dimensional space.
-       metric : string or function
-           The distance metric to use. The distance function can
-           be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
-           'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
-           'jaccard', 'kulsinski', 'mahalanobis', 'matching',
-           'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
-           'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
-       w : ndarray
-           The weight vector (for weighted Minkowski).
-       p : double
-           The p-norm to apply (for Minkowski, weighted and unweighted)
-       V : ndarray
-           The variance vector (for standardized Euclidean).
-       VI : ndarray
-           The inverse of the covariance matrix (for Mahalanobis).
+    The following are common calling conventions.
 
-    :Returns:
-       Y : ndarray
-           A condensed distance matrix.
-
-    Calling Conventions
-    -------------------
-
     1. ``Y = pdist(X, 'euclidean')``
 
        Computes the distance between m points using Euclidean distance
@@ -886,9 +863,9 @@
        Computes the standardized Euclidean distance. The standardized
        Euclidean distance between two n-vectors ``u`` and ``v`` is
 
-       .. math:
+       .. math::
 
-          sqrt(\sum {(u_i-v_i)^2 / V[x_i]}).
+          \sqrt{\sum {(u_i-v_i)^2 / V[x_i]}}.
 
        V is the variance vector; V[i] is the variance computed over all
           the i'th components of the points. If not passed, it is
@@ -903,7 +880,7 @@
 
        Computes the cosine distance between vectors u and v,
 
-       .. math:
+       .. math::
 
           \frac{1 - uv^T}
                {{|u|}_2 {|v|}_2}
@@ -914,7 +891,7 @@
 
        Computes the correlation distance between vectors u and v. This is
 
-       .. math:
+       .. math::
 
           \frac{1 - (u - \bar{u})(v - \bar{v})^T}
                {{|(u - \bar{u})|}{|(v - \bar{v})|}^T}
@@ -942,7 +919,7 @@
        maximum norm-1 distance between their respective elements. More
        precisely, the distance is given by
 
-       .. math:
+       .. math::
 
           d(u,v) = max_i {|u_i-v_i|}.
 
@@ -951,7 +928,7 @@
        Computes the Canberra distance between the points. The
        Canberra distance between two points ``u`` and ``v`` is
 
-       .. math:
+       .. math::
 
          d(u,v) = \sum_u {|u_i-v_i|}
                          {|u_i|+|v_i|}
@@ -963,7 +940,7 @@
        Bray-Curtis distance between two points ``u`` and ``v`` is
 
 
-       .. math:
+       .. math::
 
             d(u,v) = \frac{\sum_i {u_i-v_i}}
                           {\sum_i {u_i+v_i}}
@@ -1043,6 +1020,31 @@
 
          dm = pdist(X, 'sokalsneath')
 
+    :Parameters:
+       X : ndarray
+           An m by n array of m original observations in an
+           n-dimensional space.
+       metric : string or function
+           The distance metric to use. The distance function can
+           be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
+           'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
+           'jaccard', 'kulsinski', 'mahalanobis', 'matching',
+           'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
+           'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
+       w : ndarray
+           The weight vector (for weighted Minkowski).
+       p : double
+           The p-norm to apply (for Minkowski, weighted and unweighted)
+       V : ndarray
+           The variance vector (for standardized Euclidean).
+       VI : ndarray
+           The inverse of the covariance matrix (for Mahalanobis).
+
+    :Returns:
+       Y : ndarray
+           A condensed distance matrix.
+
+
        """
 
 
@@ -1592,9 +1594,9 @@
        Computes the standardized Euclidean distance. The standardized
        Euclidean distance between two n-vectors ``u`` and ``v`` is
 
-       .. math:
+       .. math::
 
-          sqrt(\sum {(u_i-v_i)^2 / V[x_i]}).
+          \sqrt{\sum {(u_i-v_i)^2 / V[x_i]}}.
 
        V is the variance vector; V[i] is the variance computed over all
           the i'th components of the points. If not passed, it is
@@ -1609,7 +1611,7 @@
 
        Computes the cosine distance between vectors u and v,
 
-       .. math:
+       .. math::
 
           \frac{1 - uv^T}
                {{|u|}_2 {|v|}_2}
@@ -1620,7 +1622,7 @@
 
        Computes the correlation distance between vectors u and v. This is
 
-       .. math:
+       .. math::
 
           \frac{1 - (u - n{|u|}_1){(v - n{|v|}_1)}^T}
                {{|(u - n{|u|}_1)|}_2 {|(v - n{|v|}_1)|}^T}
@@ -1650,7 +1652,7 @@
        maximum norm-1 distance between their respective elements. More
        precisely, the distance is given by
 
-       .. math:
+       .. math::
 
           d(u,v) = max_i {|u_i-v_i|}.
 
@@ -1659,7 +1661,7 @@
        Computes the Canberra distance between the points. The
        Canberra distance between two points ``u`` and ``v`` is
 
-       .. math:
+       .. math::
 
          d(u,v) = \sum_u {|u_i-v_i|}
                          {|u_i|+|v_i|}
@@ -1671,7 +1673,7 @@
        Bray-Curtis distance between two points ``u`` and ``v`` is
 
 
-       .. math:
+       .. math::
 
             d(u,v) = \frac{\sum_i {u_i-v_i}}
                           {\sum_i {u_i+v_i}}




More information about the Scipy-svn mailing list