[SciPy-User] SciPy and Recursion
Gael Varoquaux
gael.varoquaux at normalesup.org
Mon Feb 28 06:20:59 EST 2011
On Mon, Feb 28, 2011 at 09:56:41AM +0100, Sebastian Haase wrote:
> could you explain, what you mean by suboptimal!?
> Do mean speed-wise ?
> I had a longish thread on the numpy list recently, where I was trying
> to gain speed using OpenMP and/or SSE.
> And cdist turned out to as fast as my (best) C implementation (for
> less than 2-3 threads).
I did mean speed-wise: for high-dimensional data, scikit learn can be
significantly faster:
In [1]: X = np.random.random((1000, 500))
In [2]: Y = np.random.random((1000, 500))
In [3]: from scipy import spatial as sp
In [4]: %time sp.distance.cdist(X, Y)
CPU times: user 0.56 s, sys: 0.00 s, total: 0.56 s
Wall time: 1.16 s
Out[5]:
array([[ 9.14394009, 9.27152238, 8.9976296 , ..., 9.18902138,
8.63073757, 8.8818356 ],
[ 9.03243891, 9.37592823, 8.76692936, ..., 9.25943615,
9.09636773, 8.75653576],
[ 9.06511143, 8.69746052, 9.12285065, ..., 9.08133078,
8.93667671, 9.00539463],
...,
[ 9.35929309, 8.87066188, 9.24649229, ..., 9.4306161 ,
9.12252869, 9.00311071],
[ 9.25729667, 8.9454522 , 9.17794614, ..., 9.30332972,
9.43599469, 9.00881447],
[ 9.10675538, 8.67428177, 8.6647222 , ..., 8.89505099,
9.12760646, 9.01155698]])
In [6]: from scikits.learn.metrics import pairwise
In [7]: %time pairwise.euclidean_distances(X, Y)
CPU times: user 0.17 s, sys: 0.01 s, total: 0.18 s
Wall time: 0.20 s
Out[8]:
array([[ 9.14394009, 9.27152238, 8.9976296 , ..., 9.18902138,
8.63073757, 8.8818356 ],
[ 9.03243891, 9.37592823, 8.76692936, ..., 9.25943615,
9.09636773, 8.75653576],
[ 9.06511143, 8.69746052, 9.12285065, ..., 9.08133078,
8.93667671, 9.00539463],
...,
[ 9.35929309, 8.87066188, 9.24649229, ..., 9.4306161 ,
9.12252869, 9.00311071],
[ 9.25729667, 8.9454522 , 9.17794614, ..., 9.30332972,
9.43599469, 9.00881447],
[ 9.10675538, 8.67428177, 8.6647222 , ..., 8.89505099,
9.12760646, 9.01155698]])
However, I it does depend on the dimensionality of the data:
In [9]: X = np.random.random((1000, 3))
In [10]: Y = np.random.random((1000, 3))
In [11]: %timeit sp.distance.cdist(X, Y)
100 loops, best of 3: 11.9 ms per loop
In [12]: %timeit pairwise.euclidean_distances(X, Y)
10 loops, best of 3: 35.4 ms per loop
and juging by David's question, he was probably operating with 3D data:
> > On Sat, Feb 26, 2011 at 02:32:59PM -0800, David Baddeley wrote:
> >> now got scipy running - you're going to want:
> >> dist_list = sp.distance.cdist(cluster_shifted, xyz.reshape((-1, 3)))
So, I must apologies, I answer off-topic: David you probably should be
using scipy spatial.
Gael
More information about the SciPy-User
mailing list