[SciPy-Dev] Proposal: loosen the coupling between scipy.sparse and np.matrix

Anton Akhmerov i at antonakhmerov.org
Sun Aug 1 09:47:52 EDT 2021


Dear scipy developers,

TL;DR: instead of big changes to the scipy.sparse interface to become
more array-like, I propose to reduce the exposure of users
specifically to np.matrix.

Firstly, I'm new to the list, and therefore I apologize in advance if
I overlooked relevant previous discussions—I checked the issue
tracker, the roadmap, and the last few months of the mailing list.
Having found nothing that contradicts my take, I hope my proposal
will improve the state of scipy.sparse.

---

# Current state of the art

For historical reasons, scipy.sparse was designed to mimic the
np.matrix interface. However, due to various interface warts, numpy
matrices became unpopular over time, and hence less known to users.
They are slowly removed from various codebases, and numpy itself does
not recommend using matrices (although it doesn't quite issue a
deprecation warning yet). I assume that there's an overall community
preference to get rid of numpy matrices over time, also from the scipy
side.

There are two main ways in which scipy.sparse.spmatrix relates to np.matrix:

1. The interface of spmatrix itself is similar to np.matrix. Some
examples of this behavior are:
  - slicing a spmatrix returns another spmatrix even in cases when
numpy would return a vector
  - __mul__ is __matmul__, and not element-wise multiplication
  - properties .H, .A, etc mimic np.matrix
2. spmatrix produces numpy matrices. Some examples of this behavior
are the spmatrix.todense method and additions of ndarray and spmatrix.

While changing the first aspect is hard both for reasons of backwards
compatibility, and the amount of required changes in the codebase, the
second takes less work but it is likely to improve the usability of
spmatrix. A natural step would be to deprecate spmatrix.todense (or
potentially switch it to return an array). Shortly after I opened an
issue [1] proposing to do so, I learned that spmatrix.todense
continues confusing new users [2], and even some contributors [3]. I
think the current usage pattern capturing these cases is as follows (I
certainly went through these steps a bunch).

1. Do something with sparse.spmatrix that produces an np.matrix
without realizing it
2. See a bug, exception, or deprecation warning.
3. Figure out what's going wrong, play whack-a-mole and modify the
code to produce an array instead.

The other part of the interface (replicating np.matrix behavior) is
also not without its problems. For example, carbon copying the
semantics of matrix.A implements silent conversion of an spmatrix to
array on attribute access, and therefore hides computational
complexity.

# Proposal

I recognize that scipy.sparse is a mature and widely used codebase,
and backwards compatibility is extremely important. At the same time,
I believe that exposing users and developers to np.matrix also has a
continued user and maintainer code. Therefore I propose to deprecate
and then remove the ability to produce np.matrix from scipy.sparse
codebase. I also propose to revisit the parts of the np.matrix
interface that were copied, and that can cause troubles, spmatrix.A
being a prime example [4].

What do you all think?

Best,
Anton

[1]: https://github.com/scipy/scipy/issues/14494
[2]: https://github.com/scipy/scipy/issues/14131
[3]: https://github.com/scipy/scipy/pull/14488#discussion_r678451919
[4]: https://github.com/scipy/scipy/issues/14503


More information about the SciPy-Dev mailing list