vectorized linalg

Mon Oct 9 10:10:09 EDT 2006

I periodically need to perform various linalg operations on a large 
number of small matrices. The normal numpy approach of stacking up the 
data into an extra dimension and doing the operations in parallel 
doesn't work because the linalg functions are only designed to work on 
one matrix at a time. At first glance, this restriction seems harmless 
enough since the underlying LAPACK functions are also designed to work 
one matrix at a time and it seem that there would be little to be gained 
by parallelizing the Python level code. However, this turns out to be 
the case.

I have some code, originally written for numarray, but I recently ported 
it over to numpy, that rewrites the linalg function determinant, inverse 
and solve_linear_equations[1] to operate of 2 or 3D arrays, treating 3D 
arrays as stacks of 2D arrays. For operating on large numbers of small 
arrays (for example 1000 2x2 arrays), this approach is over an order of 
magnitude faster than the obvious map(inverse, threeDArray) approach.

So, some questions:

    1. Is there any interest in this code for numpy.linalg? I'd be 
willing to clean it up and work on the other functions in linalg so that 
they were all consistent. Since these changes should all be backwards 
compatible (unless your counting on catching an error from one of the 
linalg functions), I don't see this as a problem to put in after 1.0, so 
there's really no rush on this. I'm only bringing it up now since I just 
ported it over to numpy and it's fresh in my mind.

    2. If 1., what to do about norm? I think the other functions in 
linalg stack naturally, but norm, since it is meant to work on both 
vectors and matrices is problematic. Norm already seems a bit 
problematic in that, for some values of 'ord', norm can return different 
values for a shape [N] and a shape [1,N] array containing the same values:

     >>> linalg.norm(a[:1], 1)
    0.78120069791102054
     >>> linalg.norm(a[0], 1)
    1.4588102446677758

    My inclination is to introduce stackable 1 and 2D version s of norm, 
vnorm and mnorm for lack of better names. Ideally we'd deprecate the use 
of norm for ord!=None (Froebenius norm) or at least in cases where the 
result depends on the rank [1,N] versus [N].

    3. A similar issue arises with dot. One would like to be able to do 
stacked matrix products, vector products and matrix-vector products. If 
we are making the rest of linalg stackable, linalg would be a sensible 
place for a stackable version of dot to live. However, it's immediately 
clear how to do this without introducing a whole pile (4) of stacking 
dot function to handle the various cases, plus broadcasting, cleanly. 
This would require some more thought. Thoughts?

-tim

[1] Those are actually the numarray names, which the current code still 
uses, the numpy names are 'det', 'inv' and 'solve'

[2] Treatment of stacking insolve linear equations is a bit more 
complicated, but I'll ignore that for now.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV