vectorized linalg
Tim Hochberg
tim.hochberg at ieee.org
Mon Oct 9 10:10:09 EDT 2006
I periodically need to perform various linalg operations on a large
number of small matrices. The normal numpy approach of stacking up the
data into an extra dimension and doing the operations in parallel
doesn't work because the linalg functions are only designed to work on
one matrix at a time. At first glance, this restriction seems harmless
enough since the underlying LAPACK functions are also designed to work
one matrix at a time and it seem that there would be little to be gained
by parallelizing the Python level code. However, this turns out to be
the case.
I have some code, originally written for numarray, but I recently ported
it over to numpy, that rewrites the linalg function determinant, inverse
and solve_linear_equations[1] to operate of 2 or 3D arrays, treating 3D
arrays as stacks of 2D arrays. For operating on large numbers of small
arrays (for example 1000 2x2 arrays), this approach is over an order of
magnitude faster than the obvious map(inverse, threeDArray) approach.
So, some questions:
1. Is there any interest in this code for numpy.linalg? I'd be
willing to clean it up and work on the other functions in linalg so that
they were all consistent. Since these changes should all be backwards
compatible (unless your counting on catching an error from one of the
linalg functions), I don't see this as a problem to put in after 1.0, so
there's really no rush on this. I'm only bringing it up now since I just
ported it over to numpy and it's fresh in my mind.
2. If 1., what to do about norm? I think the other functions in
linalg stack naturally, but norm, since it is meant to work on both
vectors and matrices is problematic. Norm already seems a bit
problematic in that, for some values of 'ord', norm can return different
values for a shape [N] and a shape [1,N] array containing the same values:
>>> linalg.norm(a[:1], 1)
0.78120069791102054
>>> linalg.norm(a[0], 1)
1.4588102446677758
My inclination is to introduce stackable 1 and 2D version s of norm,
vnorm and mnorm for lack of better names. Ideally we'd deprecate the use
of norm for ord!=None (Froebenius norm) or at least in cases where the
result depends on the rank [1,N] versus [N].
3. A similar issue arises with dot. One would like to be able to do
stacked matrix products, vector products and matrix-vector products. If
we are making the rest of linalg stackable, linalg would be a sensible
place for a stackable version of dot to live. However, it's immediately
clear how to do this without introducing a whole pile (4) of stacking
dot function to handle the various cases, plus broadcasting, cleanly.
This would require some more thought. Thoughts?
-tim
[1] Those are actually the numarray names, which the current code still
uses, the numpy names are 'det', 'inv' and 'solve'
[2] Treatment of stacking insolve linear equations is a bit more
complicated, but I'll ignore that for now.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
More information about the NumPy-Discussion
mailing list