[SciPy-user] NumPy matrix-vector calculation

Tue Oct 2 12:39:46 EDT 2007

On 02/10/2007, Dinesh B Vadhia <dineshbvadhia at hotmail.com> wrote:

> We have an MxN (M not equal to N) integer matrix A.  The data for A is
> read-in from a file on persistant storage and the data is immutable (ie.
> does not change and cannot be changed).
>
> The vector x is a vector of size Nx1.  The data elements of x are calculated
> during the program execution.
>
> We then perform a matrix-vector calculation ie. y = Ax, where the resulting
> y is a Mx1 vector.
>
> Both x and y are then discarded and a new x and y are calculated and this
> continues until program execution stops but at all times the matrix A
> remains the same until ...
>
> Under certain circumstances, we may have to increase the size of M to M+R
> leaving N alone ie. append R rows to the end of matrix A.  We would want to
> do this while the program is executing.
>
> Here are the questions:
>
> - What NumPy array structure do we use for A - an array or matrix (and why)?
> - If A is a matrix data structure then do the x and y vectors have to be
> matrix structures too or can you mix the data structures?
> - If A is a matrix or an array structure, can we append rows during program
> execution and if so, how do we do this?

The only difference between numpy arrays and matrices is the way
functions act on them - in particular, the * operator behaves
differently (for arrays it operates elementwise and for matrices it
applies the matrix product, specifically the function dot()). As data
structures they are identical.

When you talk about increasing M, that presumably means enlarging A.
Is your idea that A has changed on disk? (You did say it was
immutable.) The short answer is that you basically can't enlarge an
array in place, as it is (under the hood) a single large block of
memory. Copying is not all that expensive, for a once-in-a-while
operation, so you can just use hstack() or vstack() or concatenate()
to enlarge A, allocating a new array in the process. If A is on disk,
and you want to reflect changes on disk, you can try using numpy's
memory mapped arrays: these take advantage of your operating system's
ability to make a disk file look like a piece of memory. Each
matrix-vector product does require traversing all of A, though, so the
matrix will need to be loaded into memory regardless. (Incidentally,
if you combine several vectors into a matrix and multiply them by A
all at once it will probably be faster, since numpy/scipy uses
optimized matrix-multiplication routines that are reasonably smart
about cache.)

Anne