[SciPy-Dev] faster pdist and cdist when metric='sqeuclidean' (pull-request)
Emanuele Olivetti
emanuele at relativita.com
Tue Jan 28 11:04:48 EST 2014
Dear SciPy developers,
I noticed that both scipy.spatial.pdist() and cdist() have pretty
inefficient computation when metric='sqeuclidean'. In essence
the current code computes the distances with metric='euclidean'
and then - literally - does **2. So a lot of useless computation
is done: there is no need to do sqrt() first and then **2.
I've added an issue on the github repository, see:
https://github.com/scipy/scipy/issues/3251
I've also prepared a pull-request to address this issue:
https://github.com/scipy/scipy/pull/3252
which adds C functions for metric='sqeuclidean' and
wraps them up till the module level (distance.py). The added code
is (no surprise) almost identical to the case metric='euclidean', so
this enhancement was pretty straighforward and mainly a (careful :))
cut-n-paste from previous code. The nice part is that there is a
(at least) 2x speedup in the computation.
Of course all tests that passed before, still pass now.
There are no changes at the API-level, so there is no impact on the
documentation.
Comments are welcome.
Best,
Emanuele
More information about the SciPy-Dev
mailing list