[Scipy-svn] r5170 - in trunk/scipy: cluster spatial
scipy-svn at scipy.org
scipy-svn at scipy.org
Sat Nov 22 15:18:16 EST 2008
Author: damian.eads
Date: 2008-11-22 14:18:12 -0600 (Sat, 22 Nov 2008)
New Revision: 5170
Modified:
trunk/scipy/cluster/hierarchy.py
trunk/scipy/spatial/distance.py
Log:
Word-smithing some hierarchy and spatial documentation.
Modified: trunk/scipy/cluster/hierarchy.py
===================================================================
--- trunk/scipy/cluster/hierarchy.py 2008-11-22 19:09:35 UTC (rev 5169)
+++ trunk/scipy/cluster/hierarchy.py 2008-11-22 20:18:12 UTC (rev 5170)
@@ -331,6 +331,20 @@
Performs centroid/UPGMC linkage. See ``linkage`` for more
information on the return structure and algorithm.
+ The following are common calling conventions:
+
+ 1. Z = centroid(y)
+
+ Performs centroid/UPGMC linkage on the condensed distance
+ matrix ``y``. See ``linkage`` for more information on the return
+ structure and algorithm.
+
+ 2. Z = centroid(X)
+
+ Performs centroid/UPGMC linkage on the observation matrix ``X``
+ using Euclidean distance as the distance metric. See ``linkage``
+ for more information on the return structure and algorithm.
+
:Parameters:
Q : ndarray
A condensed or redundant distance matrix. A condensed
@@ -346,21 +360,6 @@
the ``linkage`` function documentation for more information
on its structure.
- Calling Conventions
- -------------------
-
- 1. Z = centroid(y)
-
- Performs centroid/UPGMC linkage on the condensed distance
- matrix ``y``. See ``linkage`` for more information on the return
- structure and algorithm.
-
- 2. Z = centroid(X)
-
- Performs centroid/UPGMC linkage on the observation matrix ``X``
- using Euclidean distance as the distance metric. See ``linkage``
- for more information on the return structure and algorithm.
-
:SeeAlso:
- linkage: for advanced creation of hierarchical clusterings.
"""
@@ -371,18 +370,8 @@
Performs median/WPGMC linkage. See ``linkage`` for more
information on the return structure and algorithm.
- :Parameters:
- Q : ndarray
- A condensed or redundant distance matrix. A condensed
- distance matrix is a flat array containing the upper
- triangular of the distance matrix. This is the form that
- ``pdist`` returns. Alternatively, a collection of
- m observation vectors in n dimensions may be passed as
- a m by n array.
+ The following are common calling conventions:
- Calling Conventions
- -------------------
-
1. Z = median(y)
Performs median/WPGMC linkage on the condensed distance matrix
@@ -395,6 +384,19 @@
using Euclidean distance as the distance metric. See linkage
for more information on the return structure and algorithm.
+ :Parameters:
+ Q : ndarray
+ A condensed or redundant distance matrix. A condensed
+ distance matrix is a flat array containing the upper
+ triangular of the distance matrix. This is the form that
+ ``pdist`` returns. Alternatively, a collection of
+ m observation vectors in n dimensions may be passed as
+ a m by n array.
+
+ :Returns:
+ - Z : ndarray
+ The hierarchical clustering encoded as a linkage matrix.
+
:SeeAlso:
- linkage: for advanced creation of hierarchical clusterings.
"""
@@ -406,18 +408,8 @@
matrix. See linkage for more information on the return structure
and algorithm.
- :Parameters:
- Q : ndarray
- A condensed or redundant distance matrix. A condensed
- distance matrix is a flat array containing the upper
- triangular of the distance matrix. This is the form that
- ``pdist`` returns. Alternatively, a collection of
- m observation vectors in n dimensions may be passed as
- a m by n array.
+ The following are common calling conventions:
- Calling Conventions
- -------------------
-
1. Z = ward(y)
Performs Ward's linkage on the condensed distance matrix Z. See
linkage for more information on the return structure and
@@ -428,6 +420,19 @@
Euclidean distance as the distance metric. See linkage for more
information on the return structure and algorithm.
+ :Parameters:
+ Q : ndarray
+ A condensed or redundant distance matrix. A condensed
+ distance matrix is a flat array containing the upper
+ triangular of the distance matrix. This is the form that
+ ``pdist`` returns. Alternatively, a collection of
+ m observation vectors in n dimensions may be passed as
+ a m by n array.
+
+ :Returns:
+ - Z : ndarray
+ The hierarchical clustering encoded as a linkage matrix.
+
:SeeAlso:
- linkage: for advanced creation of hierarchical clusterings.
"""
@@ -476,111 +481,109 @@
combined to form cluster :math:`u`. Let :math:`v` be any
remaining cluster in the forest that is not :math:`u`.
- :Parameters:
- Q : ndarray
- A condensed or redundant distance matrix. A condensed
- distance matrix is a flat array containing the upper
- triangular of the distance matrix. This is the form that
- ``pdist`` returns. Alternatively, a collection of
- :math:`m` observation vectors in n dimensions may be passed as
- a :math:`m` by :math:`n` array.
- method : string
- The linkage algorithm to use. See the ``Linkage Methods``
- section below for full descriptions.
- metric : string
- The distance metric to use. See the ``distance.pdist``
- function for a list of valid distance metrics.
-
- Linkage Methods
- ---------------
-
The following are methods for calculating the distance between the
newly formed cluster :math:`u` and each :math:`v`.
- * method=``single`` assigns
+ * method=``single`` assigns
- .. math:
- d(u,v) = \min(dist(u[i],v[j]))
+ .. math::
+ d(u,v) = \min(dist(u[i],v[j]))
- for all points :math:`i` in cluster :math:`u` and
- :math:`j` in cluster :math:`v`. This is also known as the
- Nearest Point Algorithm.
+ for all points :math:`i` in cluster :math:`u` and
+ :math:`j` in cluster :math:`v`. This is also known as the
+ Nearest Point Algorithm.
- * method=``complete`` assigns
+ * method=``complete`` assigns
- .. math:
- d(u, v) = \max(dist(u[i],v[j]))
+ .. math::
+ d(u, v) = \max(dist(u[i],v[j]))
- for all points :math:`i` in cluster u and :math:`j` in
- cluster :math:`v`. This is also known by the Farthest Point
- Algorithm or Voor Hees Algorithm.
+ for all points :math:`i` in cluster u and :math:`j` in
+ cluster :math:`v`. This is also known by the Farthest Point
+ Algorithm or Voor Hees Algorithm.
- * method=``average`` assigns
+ * method=``average`` assigns
- .. math:
- d(u,v) = \sum_{ij} \frac{d(u[i], v[j])}
- {(|u|*|v|)
+ .. math::
+ d(u,v) = \sum_{ij} \frac{d(u[i], v[j])}
+ {(|u|*|v|)
- for all points :math:`i` and :math:`j` where :math:`|u|`
- and :math:`|v|` are the cardinalities of clusters :math:`u`
- and :math:`v`, respectively. This is also called the UPGMA
- algorithm. This is called UPGMA.
+ for all points :math:`i` and :math:`j` where :math:`|u|`
+ and :math:`|v|` are the cardinalities of clusters :math:`u`
+ and :math:`v`, respectively. This is also called the UPGMA
+ algorithm. This is called UPGMA.
- * method='weighted' assigns
+ * method='weighted' assigns
- .. math:
- d(u,v) = (dist(s,v) + dist(t,v))/2
+ .. math::
+ d(u,v) = (dist(s,v) + dist(t,v))/2
- where cluster u was formed with cluster s and t and v
- is a remaining cluster in the forest. (also called WPGMA)
+ where cluster u was formed with cluster s and t and v
+ is a remaining cluster in the forest. (also called WPGMA)
+ * method='centroid' assigns
- * method='centroid' assigns
+ .. math::
+ dist(s,t) = euclid(c_s, c_t)
- .. math:
- dist(s,t) = euclid(c_s, c_t)
+ where :math:`c_s` and :math:`c_t` are the centroids of
+ clusters :math:`s` and :math:`t`, respectively. When two
+ clusters :math:`s` and :math:`t` are combined into a new
+ cluster :math:`u`, the new centroid is computed over all the
+ original objects in clusters :math:`s` and :math:`t`. The
+ distance then becomes the Euclidean distance between the
+ centroid of :math:`u` and the centroid of a remaining cluster
+ :math:`v` in the forest. This is also known as the UPGMC
+ algorithm.
- where :math:`c_s` and :math:`c_t` are the centroids of
- clusters :math:`s` and :math:`t`, respectively. When two
- clusters :math:`s` and :math:`t` are combined into a new
- cluster :math:`u`, the new centroid is computed over all the
- original objects in clusters :math:`s` and :math:`t`. The
- distance then becomes the Euclidean distance between the
- centroid of :math:`u` and the centroid of a remaining cluster
- :math:`v` in the forest. This is also known as the UPGMC
- algorithm.
+ * method='median' assigns math:`$d(s,t)$` like the ``centroid``
+ method. When two clusters s and t are combined into a new
+ cluster :math:`u`, the average of centroids s and t give the
+ new centroid :math:`u`. This is also known as the WPGMC
+ algorithm.
- * method='median' assigns math:`$d(s,t)$` like the ``centroid``
- method. When two clusters s and t are combined into a new
- cluster :math:`u`, the average of centroids s and t give the
- new centroid :math:`u`. This is also known as the WPGMC
- algorithm.
+ * method='ward' uses the Ward variance minimization algorithm.
+ The new entry :math:`d(u,v)` is computed as follows,
- * method='ward' uses the Ward variance minimization algorithm.
- The new entry :math:`d(u,v)` is computed as follows,
+ .. math::
- .. math:
+ d(u,v) = \sqrt{\frac{|v|+|s|}
+ {T}d(v,s)^2
+ + \frac{|v|+|t|}
+ {T}d(v,t)^2
+ + \frac{|v|}
+ {T}d(s,t)^2}
- d(u,v) = \sqrt{\frac{|v|+|s|}
- {T}d(v,s)^2
- + \frac{|v|+|t|}
- {T}d(v,t)^2
- + \frac{|v|}
- {T}d(s,t)^2}
+ where :math:`u` is the newly joined cluster consisting of
+ clusters :math:`s` and :math:`t`, :math:`v` is an unused
+ cluster in the forest, :math:`T=|v|+|s|+|t|`, and
+ :math:`|*|` is the cardinality of its argument. This is also
+ known as the incremental algorithm.
- where :math:`u` is the newly joined cluster consisting of
- clusters :math:`s` and :math:`t`, :math:`v` is an unused
- cluster in the forest, :math:`T=|v|+|s|+|t|`, and
- :math:`|*|` is the cardinality of its argument. This is also
- known as the incremental algorithm.
+ Warning: When the minimum distance pair in the forest is chosen, there may
+ be two or more pairs with the same minimum distance. This
+ implementation may chose a different minimum than the MATLAB(TM)
+ version.
- Warning
- -------
+ :Parameters:
+ - Q : ndarray
+ A condensed or redundant distance matrix. A condensed
+ distance matrix is a flat array containing the upper
+ triangular of the distance matrix. This is the form that
+ ``pdist`` returns. Alternatively, a collection of
+ :math:`m` observation vectors in n dimensions may be passed as
+ a :math:`m` by :math:`n` array.
+ - method : string
+ The linkage algorithm to use. See the ``Linkage Methods``
+ section below for full descriptions.
+ - metric : string
+ The distance metric to use. See the ``distance.pdist``
+ function for a list of valid distance metrics.
- When the minimum distance pair in the forest is chosen, there may
- be two or more pairs with the same minimum distance. This
- implementation may chose a different minimum than the MATLAB(TM)
- version.
+ :Returns:
+
+ - Z : ndarray
+ The hierarchical clustering encoded as a linkage matrix.
"""
if not isinstance(method, str):
raise TypeError("Argument 'method' must be a string.")
@@ -788,6 +791,10 @@
the ClusterNode object is a leaf node, its count must be 1, and its
distance is meaningless but set to 0.
+ Note: This function is provided for the convenience of the library
+ user. ClusterNodes are not used as input to any of the functions in this
+ library.
+
:Parameters:
- Z : ndarray
@@ -807,9 +814,6 @@
- L : list
The pre-order traversal.
- Note: This function is provided for the convenience of the library
- user. ClusterNodes are not used as input to any of the functions in this
- library.
"""
Z = np.asarray(Z, order='c')
@@ -945,6 +949,9 @@
r"""
Calculates inconsistency statistics on a linkage.
+ Note: This function behaves similarly to the MATLAB(TM)
+ inconsistent function.
+
:Parameters:
- d : int
The number of links up to ``d`` levels below each
@@ -971,9 +978,6 @@
\frac{\mathtt{Z[i,2]}-\mathtt{R[i,0]}}
{R[i,1]}.
-
- This function behaves similarly to the MATLAB(TM) inconsistent
- function.
"""
Z = np.asarray(Z, order='c')
Modified: trunk/scipy/spatial/distance.py
===================================================================
--- trunk/scipy/spatial/distance.py 2008-11-22 19:09:35 UTC (rev 5169)
+++ trunk/scipy/spatial/distance.py 2008-11-22 20:18:12 UTC (rev 5170)
@@ -283,10 +283,10 @@
Computes the Cosine distance between two n-vectors u and v, which
is defined as
- .. math::
+ .. math::
- \frac{1-uv^T}
- {||u||_2 ||v||_2}.
+ \frac{1-uv^T}
+ {||u||_2 ||v||_2}.
:Parameters:
u : ndarray
@@ -341,7 +341,7 @@
``u`` and ``v``. If ``u`` and ``v`` are boolean vectors, the Hamming
distance is
- .. math:
+ .. math::
\frac{c_{01} + c_{10}}{n}
@@ -398,7 +398,7 @@
Computes the Kulsinski dissimilarity between two boolean n-vectors
u and v, which is defined as
- .. math:
+ .. math::
\frac{c_{TF} + c_{FT} - c_{TT} + n}
{c_{FT} + c_{TF} + n}
@@ -453,7 +453,7 @@
Computes the Manhattan distance between two n-vectors u and v,
which is defined as
- .. math:
+ .. math::
\sum_i {u_i-v_i}.
@@ -476,7 +476,8 @@
Computes the Mahalanobis distance between two n-vectors ``u`` and ``v``,
which is defiend as
- .. math:
+ .. math::
+
(u-v)V^{-1}(u-v)^T
where ``VI`` is the inverse covariance matrix :math:`V^{-1}`.
@@ -501,7 +502,8 @@
Computes the Chebyshev distance between two n-vectors u and v,
which is defined as
- .. math:
+ .. math::
+
\max_i {|u_i-v_i|}.
:Parameters:
@@ -523,7 +525,7 @@
Computes the Bray-Curtis distance between two n-vectors ``u`` and
``v``, which is defined as
- .. math:
+ .. math::
\sum{|u_i-v_i|} / \sum{|u_i+v_i|}.
@@ -546,7 +548,7 @@
Computes the Canberra distance between two n-vectors u and v,
which is defined as
- .. math:
+ .. math::
\frac{\sum_i {|u_i-v_i|}}
{\sum_i {|u_i|+|v_i|}}.
@@ -610,7 +612,7 @@
which is defined as
- .. math:
+ .. math::
\frac{R}
\frac{c_{TT} + c_{FF} + \frac{R}{2}}
@@ -639,7 +641,7 @@
Computes the Matching dissimilarity between two boolean n-vectors
u and v, which is defined as
- .. math:
+ .. math::
\frac{c_{TF} + c_{FT}}{n}
@@ -667,7 +669,7 @@
Computes the Dice dissimilarity between two boolean n-vectors
``u`` and ``v``, which is
- .. math:
+ .. math::
\frac{c_{TF} + c_{FT}}
{2c_{TT} + c_{FT} + c_{TF}}
@@ -700,7 +702,7 @@
Computes the Rogers-Tanimoto dissimilarity between two boolean
n-vectors ``u`` and ``v``, which is defined as
- .. math:
+ .. math::
\frac{R}
{c_{TT} + c_{FF} + R}
@@ -729,7 +731,7 @@
Computes the Russell-Rao dissimilarity between two boolean n-vectors
``u`` and ``v``, which is defined as
- .. math:
+ .. math::
\frac{n - c_{TT}}
{n}
@@ -761,7 +763,7 @@
Computes the Sokal-Michener dissimilarity between two boolean vectors
``u`` and ``v``, which is defined as
- .. math:
+ .. math::
\frac{2R}
{S + 2R}
@@ -797,7 +799,7 @@
Computes the Sokal-Sneath dissimilarity between two boolean vectors
``u`` and ``v``,
- .. math:
+ .. math::
\frac{2R}
{c_{TT} + 2R}
@@ -838,33 +840,8 @@
this entry or to convert the condensed distance matrix to a
redundant square matrix.
- :Parameters:
- X : ndarray
- An m by n array of m original observations in an
- n-dimensional space.
- metric : string or function
- The distance metric to use. The distance function can
- be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
- 'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
- 'jaccard', 'kulsinski', 'mahalanobis', 'matching',
- 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
- 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
- w : ndarray
- The weight vector (for weighted Minkowski).
- p : double
- The p-norm to apply (for Minkowski, weighted and unweighted)
- V : ndarray
- The variance vector (for standardized Euclidean).
- VI : ndarray
- The inverse of the covariance matrix (for Mahalanobis).
+ The following are common calling conventions.
- :Returns:
- Y : ndarray
- A condensed distance matrix.
-
- Calling Conventions
- -------------------
-
1. ``Y = pdist(X, 'euclidean')``
Computes the distance between m points using Euclidean distance
@@ -886,9 +863,9 @@
Computes the standardized Euclidean distance. The standardized
Euclidean distance between two n-vectors ``u`` and ``v`` is
- .. math:
+ .. math::
- sqrt(\sum {(u_i-v_i)^2 / V[x_i]}).
+ \sqrt{\sum {(u_i-v_i)^2 / V[x_i]}}.
V is the variance vector; V[i] is the variance computed over all
the i'th components of the points. If not passed, it is
@@ -903,7 +880,7 @@
Computes the cosine distance between vectors u and v,
- .. math:
+ .. math::
\frac{1 - uv^T}
{{|u|}_2 {|v|}_2}
@@ -914,7 +891,7 @@
Computes the correlation distance between vectors u and v. This is
- .. math:
+ .. math::
\frac{1 - (u - \bar{u})(v - \bar{v})^T}
{{|(u - \bar{u})|}{|(v - \bar{v})|}^T}
@@ -942,7 +919,7 @@
maximum norm-1 distance between their respective elements. More
precisely, the distance is given by
- .. math:
+ .. math::
d(u,v) = max_i {|u_i-v_i|}.
@@ -951,7 +928,7 @@
Computes the Canberra distance between the points. The
Canberra distance between two points ``u`` and ``v`` is
- .. math:
+ .. math::
d(u,v) = \sum_u {|u_i-v_i|}
{|u_i|+|v_i|}
@@ -963,7 +940,7 @@
Bray-Curtis distance between two points ``u`` and ``v`` is
- .. math:
+ .. math::
d(u,v) = \frac{\sum_i {u_i-v_i}}
{\sum_i {u_i+v_i}}
@@ -1043,6 +1020,31 @@
dm = pdist(X, 'sokalsneath')
+ :Parameters:
+ X : ndarray
+ An m by n array of m original observations in an
+ n-dimensional space.
+ metric : string or function
+ The distance metric to use. The distance function can
+ be 'braycurtis', 'canberra', 'chebyshev', 'cityblock',
+ 'correlation', 'cosine', 'dice', 'euclidean', 'hamming',
+ 'jaccard', 'kulsinski', 'mahalanobis', 'matching',
+ 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean',
+ 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule'.
+ w : ndarray
+ The weight vector (for weighted Minkowski).
+ p : double
+ The p-norm to apply (for Minkowski, weighted and unweighted)
+ V : ndarray
+ The variance vector (for standardized Euclidean).
+ VI : ndarray
+ The inverse of the covariance matrix (for Mahalanobis).
+
+ :Returns:
+ Y : ndarray
+ A condensed distance matrix.
+
+
"""
@@ -1592,9 +1594,9 @@
Computes the standardized Euclidean distance. The standardized
Euclidean distance between two n-vectors ``u`` and ``v`` is
- .. math:
+ .. math::
- sqrt(\sum {(u_i-v_i)^2 / V[x_i]}).
+ \sqrt{\sum {(u_i-v_i)^2 / V[x_i]}}.
V is the variance vector; V[i] is the variance computed over all
the i'th components of the points. If not passed, it is
@@ -1609,7 +1611,7 @@
Computes the cosine distance between vectors u and v,
- .. math:
+ .. math::
\frac{1 - uv^T}
{{|u|}_2 {|v|}_2}
@@ -1620,7 +1622,7 @@
Computes the correlation distance between vectors u and v. This is
- .. math:
+ .. math::
\frac{1 - (u - n{|u|}_1){(v - n{|v|}_1)}^T}
{{|(u - n{|u|}_1)|}_2 {|(v - n{|v|}_1)|}^T}
@@ -1650,7 +1652,7 @@
maximum norm-1 distance between their respective elements. More
precisely, the distance is given by
- .. math:
+ .. math::
d(u,v) = max_i {|u_i-v_i|}.
@@ -1659,7 +1661,7 @@
Computes the Canberra distance between the points. The
Canberra distance between two points ``u`` and ``v`` is
- .. math:
+ .. math::
d(u,v) = \sum_u {|u_i-v_i|}
{|u_i|+|v_i|}
@@ -1671,7 +1673,7 @@
Bray-Curtis distance between two points ``u`` and ``v`` is
- .. math:
+ .. math::
d(u,v) = \frac{\sum_i {u_i-v_i}}
{\sum_i {u_i+v_i}}
More information about the Scipy-svn
mailing list