[Numpy-discussion] very large matrices.

Sun May 13 02:46:15 EDT 2007

They are very large numbers indeed. Thanks for giving me a wake up call.
Currently my data is represented as vectors in a vectorset, a typical
sparse representation.

I reduced the problem significantly by removing lots of noise. I'm
basically recording traces of a terms occurrence throughout a corpus
and doing an analysis of the eigenvectors.

I reduced my matrix to  4863 x 4863 by filtering the original corpus.
Now when I attempt svd, I'm finding a memory error in the svd routine.
Is there a hard upper limit of the size of a matrix for these
calculations?

  File "/usr/lib/python2.4/site-packages/numpy/linalg/linalg.py", line
575, in svd
    vt = zeros((n, nvt), t)
MemoryError

Cheers

Dave

On 5/13/07, Anne Archibald <peridot.faceted at gmail.com> wrote:
> On 12/05/07, Dave P. Novakovic <davidnovakovic at gmail.com> wrote:
>
> > core 2 duo with 4gb RAM.
> >
> > I've heard about iterative svd functions. I actually need a complete
> > svd, with all eigenvalues (not LSI). I'm actually more interested in
> > the individual eigenvectors.
> >
> > As an example, a single row could probably have about 3000 non zero elements.
>
> I think you need to think hard about whether your problem can be done
> in another way.
>
> First of all, the singular values (as returned from the svd) are not
> eigenvalues - eigenvalue decomposition is a much harder problem,
> numerically.
>
> Second, your full non-sparse matrix will be 8*75000*75000 bytes, or
> about 42 gibibytes. Put another way, the representation of your data
> alone is ten times the size of the RAM on the machine you're using.
>
> Third, your matrix has 225 000 000 nonzero entries; assuming a perfect
> sparse representation with no extra bytes (at least two bytes per
> entry is typical, usually more), that's 1.7 GiB.
>
> Recall that basically any matrix operation is at least O(N^3), so you
> can expect order 10^14 floating-point operations to be required. This
> is actually the *least* significant constraint; pushing stuff into and
> out of disk caches will be taking most of your time.
>
> Even if you can represent your matrix sparsely (using only a couple of
> gibibytes), you've said you want the full set of eigenvectors, which
> is not likely to be a sparse matrix - so your result is back up to 42
> GiB. And you should expect an eigenvalue algorithm, if it even
> survives massive roundoff problems, to require something like that
> much working space; thus your problem probably has a working size of
> something like 84 GiB.
>
> SVD is a little easier, if that's what you want, but the full solution
> is twice as large, though if you discard entries corresponding to
> small values it might be quite reasonable. You'll still need some
> fairly specialized code, though. Which form are you looking for?
>
> Solving your problem in a reasonable amount of time, as described and
> on the hardware you specify, is going to require some very specialized
> algorithms; you could try looking for an out-of-core eigenvalue
> package, but I'd first look to see if there's any way you can simplify
> your problem - getting just one eigenvector, maybe.
>
> Anne
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>