[SciPy-user] sparse SVD

Kenneth Arnold kcarnold at mit.edu
Fri May 1 13:43:36 EDT 2009


2009/4/9 Rob Patro <rob.patro at gmail.com>:
> Is there any implementation of sparse SVD available in scipy?  If not,
> does anyone know of an implementation available in python at all?  I'd
> like to port a project on which I'm working from Matlab to Python, but
> it is crucial that I am able to perform the SVD of large and *very*
> sparse matrices.


The Commonsense Computing Initiative at the MIT Media Lab
(http://csc.media.mit.edu but probably best known for
http://openmind.media.mit.edu) had a similar problem two years ago: we
wanted to run an SVD on a large, sparse semantic network. So we build
Divisi (http://divisi.media.mit.edu), which is based on numpy, but
also:

* wraps SVDLIBC (first with SWIG, now with Cython)
   (the SVD functionality is abstracted, so we could easily switch to
something like cvxopt or ARPACK which I hadn't heard of)
* has a data structure for sparse tensors (i.e., matrices with dim > 2)
* has a layered model of views enabling:
  - labeling rows and columns with arbitrary Python objects
  - various forms of normalization
  - unfolding tensors into 2D for the higher-order SVD (HO-SVD) operation
* supports various math with the SVD results
* supports "blending" data from different sources
* (in progress) can reason by association as well as similarity

The result, refined over almost 2 years of work (by grad students) has
powered nearly all of our group's research during this time. It's
released under GPL, but other licensing is possible especially if your
company sponsors the Media Lab.

If you have the numpy headers, you should be able to just
`easy_install divisi`. We've recently been working on distribution, so
let us know if anything about that is broken.

We think that significant chunks of this code would make a great
addition to numpy/scipy. We don't have the resources to push
integration ourselves, though, but we could certainly help anyone who
is interested in assimilating our code. And in the mean time it should
be useful to anyone wanting to run sparse SVDs.

-Ken



More information about the SciPy-User mailing list