[SciPy-user] Why is weave.inline()/blitz++ code 3 times slower than innerproduct()?

Thu Aug 14 17:29:12 EDT 2003

Hi all,

I think one of the strongest points in favor of python for scientific 
computing is the ability to write low-level code, when necessary, which can 
perform on-par with hand-rolled Fortran.  In the past, I've been very pleased 
using weave's inline() tool, which relies on blitz for manipulating Numpy 
arrays with an very clean and convenient syntax.

This is important, because manipulating multidimensional Numeric arrays in C 
is rather messy, and the resulting code isn't exactly an example of 
readability.  Blitz arrays end up looking just like regular arrays, using 
(i,j,k) instead of [i][j][k] for indexing.

Recently, I needed to do an operation which turned out to be pretty much what 
Numpy's innerproduct() does.  I'd forgotten about innerproduct(), so I just 
wrote my own using inline().  Later I saw innerproduct(), and decided to 
compare the results.  I'm a little worried by what I found, and I'd like to 
hear some input from the experts on this problem.

I've attached all the necessary code to run my tests, in case someone is 
willing to do it and take a look.

In summary, I found some things which concern me (a README is included in the 
.tgz with more info):

- the blitz code, whether via inline() or a purely hand-written extension, is 
~2.5 to 3 times slower than innerproduct().  Considering that this code is 
specialized to a few sizes and data types, this comes as a big surprise.  If 
the only way to get maximum performance with Numpy arrays is to write by hand 
to the full low-level api, I know that many people will shy away from python 
for a certain class of projects.  I truly hope I'm missing something here.

- There is a significant numerical discrepancy between the two approaches 
(blitz vs numpy).  In an innerproduct operation over 7000 entries, the 
discrepancy is O(1e-10) (in l2 norm).  This is more than I'm comfortable with, 
but perhaps I'm being naive or optimistic.

I view the ability to get blitzed code which performs on par with Fortran as a 
  very important aspect of python's suitability for large-scale project where 
every last bit of performance matters, but where one still wants to have the 
ability to work with a reasonably clean syntax.  I hope I'm just misusing some 
tools and not faced with a fundamental limitation.

By the way, I'll come to Scipy'03 with many more questions/concerns along 
these lines, and I think it would be great to have some discussions on these 
issues there with the experts.

Thanks in advance.

Cheers,

f.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: py_inner.tgz
Type: application/unix-tar
Size: 6059 bytes
Desc: not available
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20030814/8fc5c36d/attachment.bin>