[Numpy-discussion] Fwd: GPU Numpy

Wed Sep 9 11:34:03 EDT 2009

Christopher Barker wrote:
> George Dahl wrote:
>> Sturla Molden <sturla <at> molden.no> writes:
>>> Teraflops peak performance of modern GPUs is impressive. But NumPy 
>>> cannot easily benefit from that. 
> 
>> I know that for my work, I can get around an order of a 50-fold speedup over
>> numpy using a python wrapper for a simple GPU matrix class.
> 
> I think you're talking across each other here. Sturla is referring to 
> making a numpy ndarray gpu-aware and then expecting expressions like:
> 
> z = a*x**2 + b*x + c
> 
> to go faster when s, b, c, and x are ndarrays.
> 
> That's not going to happen.
> 
> On the other hand, George is talking about moving higher-level 
> operations (like a matrix product) over to GPU code. This is analogous 
> to numpy.linalg and numpy.dot() using LAPACK routines, and yes, that 
> could help those programs that use such operations.
> 
> So a GPU LAPACK would be nice.
> 
> This is also analogous to using SWIG, or ctypes or cython or weave, or 
> ??? to move a computationally expensive part of the code over to C.
> 
> I think anything that makes it easier to write little bits of your code 
> for the GPU would be pretty cool -- a GPU-aware Cython?

Cython is probably open for that if anybody's interested in implementing 
it/make a student project on it (way too big for GSoC I think, 
unfortunately).

However I'd definitely make it a generic library turning expressions 
into compiled code (either GPU or CPU w/SSE); that could then be used 
both at compile-time from Cython, or at run-time using e.g. SymPy or 
SAGE expressions. Both PyCUDA and CorePy would tend to allow both 
compile-time operation and run-time operation.

-- 
Dag Sverre