[SciPy-User] Sparse vector

Thu Apr 15 18:29:47 EDT 2010

On Thu, Apr 15, 2010 at 6:18 PM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> On 15 April 2010 17:29, Felix Schlesinger <schlesin at cshl.edu> wrote:
>> Hello,
>>
>> I was wondering what peoples recommendations and thoughts are on
>> sparse vectors, i.e. long vectors where most entries are 0.
>> 1-D numpy arrays waste a lot of memory in that case. Python
>> defaultdicts still use more memory then should be needed (since they
>> store python objects) and do not work well for numpy math operations
>> and slicing. Scipy.sparse has several implementations for sparse 2D
>> matrices which could be used for vectors, but that does not seem ideal
>> for clarity, efficiency and function broadcasting. Is there something
>> else out there or am I maybe missing a simple way to do it
>> efficiently?
>> In my particular case the vectors would be write-once, read-often and
>> maybe about 1% filled with integers. They are small enough to fit into
>> memory in dense form one at a time during construction.
>
> The short answer is, no, there's no support for such a thing in numpy/scipy.
>
> There's no way to make such a thing under-the-hood compatible with
> numpy arrays, since they require evenly-strided memory. And scipy's
> sparse matrices are built on the assumption that an n by n matrix will
> have at least O(n) nonzero elements, so you are going to have to watch
> carefully what you do with your sparse vectors. That said, dok
> matrices should be all right (though no more efficient than
> defaultdicts) and one of csr/csc matrices will be efficient, depending
> on whether you view your vectors as row or column matrices.
>
> I occasionally wonder whether a generalized sparse ndarray object
> would be useful. You'd want it to specify the value for empty elements
> so that things like boolean arrays could be done this way, and I think
> a dictionary of keys approach would be the way to go. In any case,
> nothing exists now.

for some applications, I keep just the nonzero values in an array and
the index separate, which allows easy back and forth conversion to
dense.
But I don't really use it as substitute for sparse, just so that I
have a convenient representation e.g. for optimization and for easier
input.

Josef

>
> Anne
>
>> Thanks
>>  Felix
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>