[SciPy-Dev] RFC: sparse DOK array

Evgeni Burovski evgeny.burovskiy at gmail.com
Mon Mar 28 12:19:12 EDT 2016


For completeness I'd mention an alternative:
https://github.com/scipy/scipy/pull/6004
This is a completely independent effort framed better for the
scipy.sparse framework.


On Mon, Mar 28, 2016 at 11:29 AM, Evgeni Burovski
<evgeny.burovskiy at gmail.com> wrote:
> Hi,
>
> TL;DR: Here's a draft implementation of a 'sparse array', suitable for
> incremental
> assembly. Comments welcome!
> https://github.com/ev-br/sparr
>
> Scipy sparse matrices are generally great. One area where they are not
> very great
> is the formats suitable for incremental assembly (esp LIL and DOK
> matrices). This
> has been discussed for a while, see, for instance,
> https://github.com/scipy/scipy/pull/2908#discussion_r16955546
>
> I spent some time to see how hard is it to build a dictionary-of-keys
> data structure without Python overhead. Turns out, not very hard.
>
> Also, when constructing a sparse data structure now, it seems to make sense
> to make it a sparse *array* rather than *matrix*, so that it supports both
> elementwise and matrix multiplication.
>
> A draft implementation is available at https://github.com/ev-br/sparr and
> I'd be grateful for any comments, large or small.
>
> Some usage examples are in the README file which Github shows at the
> front page above.
>
> First and foremost, I'd like to gauge interest in the community ;-).
> Does it actually make sense? Would you use such a data structure? What is
> missing in the current version?
>
> Short to medium term, some issues I see are:
>
> * 32-bit vs 64-bit indices. Scipy sparse matrices switch between index types.
> I wonder if this is purely backwards compatibility. Naively, it seems to me
> that a new class could just always use 64-bit indices, but this might
> be too naive?
>
> * Data types and casting rules. For now, I basically piggy-back on
> numpy's rules.
> There are several slightly different ones (numba has one?), and there might be
> an opportunity to simplify the rules. OTOH, inventing one more subtly different
> set of rules might be a bad idea.
>
> * "Object" dtype. So far, there isn't one. I wonder if it's needed or having
> only numeric types would be enough.
>
> * Interoperation with numpy arrays and other sparse matrices. I guess
> __numpy_ufunc__ would *the* solution here, when available.
> For now, I do something simple based on special-casing and __array_priority__.
> Sparse matrices almost work, but there are glitches.
>
> This is just a non-exhaustive list of things I'm thinking about. If there's
> something else (and I'm sure there is plenty!) I'm all ears --- either
> in this email thread or in the issue tracker.
> If someone feels like commenting on implementation details, sure, none of those
> is cast in stone and plumbing is definitely in a bit of a flux.
>
> Cheers,
>
> Evgeni



More information about the SciPy-Dev mailing list