[SciPy-User] Need a fast cuda/opencl sundials nvector implementation

Mon Jan 10 04:13:17 EST 2011

Hi,
 Thanks for replying.
 The sundials distribution, for cvode, provides a "psolve" preconditioning callback. Parameters to this callback include gamma and R, and the user is expected to set a vector called Z to the solution of (I - gamma*J)Z = R, where J is the Jacobian. I am doing this with standard scipy gmres, and it is very fast, and I have no problems there. I have not measured it, merely observed that it solves a ~100000 state system in a fraction of a second.

 The problem seems to me to be that if I add up all the cpu time occupied with the four sundial-cvode callbacks, that is psetup, psolve, jtimes, and the function itself, it is only a small fraction, a few percent, of the total cpu time. So optimising these callbacks is pointless. This leaves the vector operations, which the user can replace. I assume that they are implemented fairly optimally, but not exploiting a GPU.

 So what I am looking for is an implementation of the sundials NVector_serial construct using CUDA or OpenCL.

 If no such implementation is available, I might have a go at attempting to implement it, after verifying my theory that it is the vector ops taking all the cpu. However if something else is freely available, I would like to try it.

 Peter

On Sun, Jan 09, 2011 at 12:21:17PM +0100, Sebastian Walter wrote:
> Hello Peter,
> so you are saying that you want to replace the built-in functionality
> to solve large sparse linear systems by your own code which exploits
> the structure of your Jacobian?
> 
> Sebastian
> 
>