[SciPy-dev] Implementing a distance matrix between two sets of vectors concept

Peter Skomoroch peter.skomoroch at gmail.com
Wed Jul 4 15:17:49 EDT 2007


You're right, I was thinking the sparse data structures would help with
storing the input vectors themselves during the computation rather than the
final matrix (which will need to be 1/2 M*N if the distance is
symmetric)...this comes up a lot in collaborative filtering where the
dimensionality of the vectors is high, but most of the vector entries are
missing.

On 7/4/07, David Cournapeau <david at ar.media.kyoto-u.ac.jp> wrote:
>
> Peter Skomoroch wrote:
> > I've rolled my own in the past.  If the vectors are really large and
> > you are holding a collection of them, you probably want to use a
> > sparse matrix data structure in either numpy or C.
> Mmm, not sure to understand what you mean. The problem is that you have
> {u_1, ... , u_N} and {v_1, ..., v_M} vectors, and you want the distance
> for any possible combination {u_i, v_j}, which is a real (eg the actual
> size of the matrix in memory does not depends on the dimension of the
> data, only on N and M). I don't see how sparsity can help help here ?
>
> David
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-dev
>



-- 
Peter N. Skomoroch
peter.skomoroch at gmail.com
http://www.datawrangling.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20070704/498183fe/attachment.html>


More information about the SciPy-Dev mailing list