[SciPy-dev] Re: [Numpy-discussion] adding a .M attribute to the array.

Mon Mar 18 09:47:59 EST 2002

[Sorry about the crossposting, but it also seemed relevant to both scipy and
numpy...]

Huaiyu Zhu <huaiyu_zhu at yahoo.com> writes:
[...] 
> I'd like to hear any suggestions on how to proceed.  My own favorite would
> be to have separate array and matrix classes with easy but explicit
> conversions between them.  Without conversions, arrays and matrices would
> be completely independent semantically.  In other words, I'm mostly in
> favor of Konrad Hinsen's position, with the addition of using ~ operators
> for elementwise operations for matrix-like classes.  The PEP itself also
> discussed ideas of extending the meaning of ~ to other parts of Python for
> elementwise operations on aggregate types, but my impressions of people's
> impressions is that it has a better chance without that part.
> 

Well, from my impression of the previous discussions, the situation (both for
numpy and scipy) seems to boil down to me as follows:

Either `array` currently is too much of a matrix, or too little:

Linear algebra functionality is currently exclusively provided by `array` and
libraries that operate on and return `array`s, but the computational and
notational efficiency leaves to be desired (compared to e.g. Matlab) in some
areas, importantly matrix multiplications (which are up to 40 times slower)
and really awkward to write (and much more importantly, decipher afterwards).

So I think what one should really do is discuss the advantages and
disadvantages of the two possible ways out of this situation, namely
providing:

1) a new (efficient) `matrix` class/type (and appropriate libraries that
   operate on it) [The Matrix class that comes with Numeric is more some
   syntactic sugar wrapper -- AFAIK it's not use as a return type or argument
   in any of the functions that only make sense for arrays that are matrices].

2) the additional functionality that is needed for linear algebra in `array`
   and the libraries that operate on it.

(see [1] below for what I feel is currently missing and could be done
either in way 1) or 2))

I think it might be helpful to investigate these "macro"-issues before one
gets bogged down in discussions about operators (I admit these are not
entirely unrelated -- given that one of the reasons for the creation of a
Matrix type would be that '*' is already taken in 'array's and there is no way
to add a new operator without modifying the python core -- just for the record
and ignoring my own advice, _iff_ there is a chance of getting '~*' into the
language, I'd rather have '*' do the same for both matrices and arrays).

My impression is that the best path also very much depends on the what the
feature aspirations and divisions of labor of numpy/numarray and scipy are
going to be. For example, scipy is really aimed at scientific users, which
need performance, and are willing to buy it with inconvenience (like the
necessity to install other libraries on one's machine, most prominently atlas
and blas). The `array` type and the functions in `Numeric`, on the other hand,
potentially target a much wider community -- the efficient storage and
indexing facilities (rich comparisons, strides, the take, choose
etc. functions) make it highly useful for code that is not necessarily
numeric, (as an example I'm currently using it for feature selection
algorithms, without doing any numerical computations on the arrays). So maybe
(a subset of) numpy should make it into the python core (or an as yet
`non-existent sumo-distribution`) [BTW, I also wonder whether the python-core
array module could be superseded/merged with numpy's `array`? One potential
show stopper seems to be that it is e.g. `pop`able].

In such a scenario, where numpy remains relatively general (and might even aim
at incorporation into the core), it would be a no-no to bloat it with too much
code aimed at improving efficiency (calling blas when possible, sparse storage
etc.). On the other hand people who want to do serious numerical work will
need this -- and the scipy community already requires atlas etc. and targets a
more specialized audience. Under this consideration it might be an attractive
solution do incorporate good matrix functionality (and possibly other
improvements for hard core number crunchers) in scipy only (or at least limit
the efficient _implementation_ of matrices to scipy, providing at only a pure
python class or so in numpy). I'm not suggesting, BTW, to necessarily put all
of [1] into a single class -- it seems sensible to have a couple of subclasses
(for masked, sparse representations etc.) to `matrix` (maybe the parent-class
should even be a relatively naïve Numpy implementation, with the good stuff as
subclasses in scipy...).

In any event, creating a new matrix class/type would also mean that matrix
functionality in libraries should use and return this class (existing
libraries should presumably largely still operate on arrays for
backwards-compatibily (or both -- after a typecheck), and
some matrix operations are so useful that it makes sense to provide array
versions for them (e.g. dot) -- but on the whole it makes little sense to have
a computationally and space efficient matrix type if one has to cast it around
all the time). A `matrix` class is more specialized than an `array` and since
the operations one will often do on it are consequently more limited, I think
it should provide most important functionality as methods (rather than as
external functions; see [2] for a list of suggestions).

Approach 1) on the other hand would have the advantage that the current
interface would stay pretty much the same, and as long as 2D arrays can just
be regarded as matrices, there is no absolutely compelling reason not to stuff
everything into array (at least the scipy-version thereof).

Another important question to ask before deciding what to change how and if,
is obviously how many people in the scipy/numpy community do lots of linear
algebra (and how many deflectors from matlab etc. one could hope to win if one
spiced things up a bit for them...), but I would suppose there must be quite a
few (but I'm certainly biased ;).

Unfortunately, I've really got to do some work again now, but before I return
to number-crunching I'd like to say that I'd be happy to help with the
implementation of a matrix class/type in python (I guess a .py-prototype would
be helpful to start with, but ultimately a (subclassable) C(++)-type will be
called for, at least in scipy).

--alex

Footnotes: 
[1]  The required improvements for serious linear algebra seem to be:

     - optional use (atlas) blas routines for real and complex matrix, matrix
       `dot`s if atlas is available on the build machine (see
       http://www.scipy.org/Members/aschmolck for a patch -- it produces
       speedups of more than factor 40 for big matrices; I'd be willing to
       provide an equivalent patch for the scipy distribution if there is
       interest)

     - making sure that no unnecessary copies are created (e.g. when
       transposing a matrix to use it in `dot` -- AFAIK although the transpose
       itself only creates a new view, using it for dot results in a copy (but
       I might be wrong here ))

     - allowing more space efficient storage forms for special cases
       (e.g. sparse matrices, upper triangular etc.). IO libraries that can
       save and load such representations are also needed (methods and static
       methods might be a good choice to keep things transparent to the user).

     - providing a convinient and above all legible notation for common matrix
       operations (better than `dot(tranpose(A),B)` etc. -- possibilities
       include A * B.T or A ~* B.T or A * B ** T (by overloding __rpow__ as
       suggested in a previous post))

     - (in the case of a new `matrix` class): indexing functionality
       (e.g. `where`, `choose` etc. should be available without having to
       cast, e.g. for the common case that I want to set everything under a
       certain threshold to 0., I don't want to have to cast my sparse matrix
       to an array etc.)

[2]  What should a matrix class contain?

     - a dot operator (certainly eventually, but if there is a good chance to
       get ~* into python, maybe '*' should remain unimplemented till this can
       be decided)

     - most or all of what scipy's linalg module does

     - possibly IO, (reading  as a static method)

     - indexing (the like of take, choose etc. (some should maybe be functions
       or static methods))

-- 
Alexander Schmolck     Postgraduate Research Student
                       Department of Computer Science
                       University of Exeter
A.Schmolck at gmx.net     http://www.dcs.ex.ac.uk/people/aschmolc/