[SciPy-Dev] faster pdist and cdist when metric='sqeuclidean' (pull-request)

Emanuele Olivetti emanuele at relativita.com
Tue Jan 28 11:04:48 EST 2014


Dear SciPy developers,

I noticed that both scipy.spatial.pdist() and cdist() have pretty
inefficient computation when metric='sqeuclidean'. In essence
the current code computes the distances with metric='euclidean'
and then - literally - does **2. So a lot of useless computation
is done: there is no need to do sqrt() first and then **2.
I've added an issue on the github repository, see:
   https://github.com/scipy/scipy/issues/3251

I've also prepared a pull-request to address this issue:
   https://github.com/scipy/scipy/pull/3252
which adds C functions for metric='sqeuclidean' and
wraps them up till the module level (distance.py). The added code
is (no surprise) almost identical to the case metric='euclidean', so
this enhancement was pretty straighforward and mainly a (careful :))
cut-n-paste from previous code. The nice part is that there is a
(at least) 2x speedup in the computation.
Of course all tests that passed before, still pass now.

There are no changes at the API-level, so there is no impact on the
documentation.

Comments are welcome.

Best,

Emanuele




More information about the SciPy-Dev mailing list