[SciPy-Dev] RFC: sparse DOK array

Evgeni Burovski evgeny.burovskiy at gmail.com
Mon Mar 28 17:37:51 EDT 2016


Thanks Stephan,


> A few other things small things I'd like to see:
> - Support for slicing, even if it's expensive.


Slicing is on the TODO list. Only needs a bit of plumbing work.

> - A strict way to set the shape without automatic expansion, if desired
> (e.g., if shape is provided in the constructor).

You can set the initial shape in the constructor. Or you mean a flag
to freeze the shape once it's set and have
`__setitem__(out-of-bounds)` raise an error?

> - Default to the dtype of the fill_value. NumPy does this for np.full.

Thanks for the suggestion --- done and implemented!


>> * Data types and casting rules. For now, I basically piggy-back on
>> numpy's rules.
>> There are several slightly different ones (numba has one?), and there
>> might be
>> an opportunity to simplify the rules. OTOH, inventing one more subtly
>> different
>> set of rules might be a bad idea.
>
>
> Yes, please follow NumPy.

One thing I'm wondering is the numpy rule that scalars never upcast
arrays. Is it something people actually rely on? [In my experience, I
only had to work around it, but my experience might be singular.]


> You could actually use a mix of __array_prepare__ and __array_wrap__ to make
> (non-generalized) ufuncs work, e.g., for functions like np.sin:
>
> - In __array_prepare__, return the non-fill values of the array concatenated
> with the fill value.
> - In __array_wrap__, reshape all but the last element to build a new sparse
> array, using the last element for the new fill value.
>
> This would be a neat trick and get you most of what you could hope for from
> __numpy_ufunc__.

This is really neat indeed! I've flagged it in
https://github.com/ev-br/sparr/issues/35

At the moment, I'm dealing with something much less cool, which I
suspect I'm not the first one: given m a MapArray and csr a
scipy.sparse matrix,

- m * csr produces a MapArray holding the result of the elementwise
multiplication, but
- csr * m fails with the dimension mismatch error when dimensions are
OK for elementwise multiply but not matrix multiply. The failure is
somewhere in the scipy.sparse code.

I tried playing with __array_priority__, but so far I did not manage
to convince scipy.sparse matrices to defer cleanly to the right-hand
multiplier (left-hand multiplier is OK).


Cheers,

Evgeni



More information about the SciPy-Dev mailing list