[SciPy-Dev] scipy.spatial comments

David Warde-Farley wardefar at iro.umontreal.ca
Sun Mar 11 06:25:10 EDT 2012


On 2012-03-11, at 4:47 AM, Gael Varoquaux wrote:

> Actually, I think that we had this discussion a while ago on the
> scikit-learn mailing list and it depends on the dimensionality of your
> feature space. For a high-dimensional feature space, you are much better
> off computing euclidean distance as you suggest, with the dot product.
> However, I think that for a low-dimensional feature space (say 3D),
> scipy's current approach is better.
> 
> I can't really compare, because on my laptop I must have a crap BLAS, as
> the dot product approach is only slighlty faster than cdist with your
> example.

Seems that you're right. If the inner dimension is 3, I get between 2 and 10x faster with cdist than with BLAS, depending on the outer dimensions. The opposite behaviour when the inner  dimension is around 50. I guess I never work in less than 30 dimensions so I have a biased sample as to what works best. It seems that somewhere holding the dimension of the result fixed at (4000, 6000), the point where BLAS overtakes naive computation is somewhere in the neighbourhood of 20-25, however it's lower if I reduce the outer dimensions to (400, 600) -- here it's somewhere in the range of 10-15.

I guess the only way to deal with this would be to either try and predict the cutoff where BLAS yields better performance on most machines (probably futile -- different machines, different BLAS, nevermind the confounding factor of the outer dimensions, etc.), or make the behaviour user-specifiable.

David


More information about the SciPy-Dev mailing list